About caching and clustering

  Written by The Jahia Team
   Estimated reading time:

High performance on high-traffic web sites is often difficult to achieve. This section presents the technologies available in Digital Experience Manager (DX) that help you handle large loads as well as scale out.

Caches

Caches enable high performing web systems such as Digital Experience Manager to avoid recreating dynamic content under large system loads. Digital Experience Manager uses a multi-layered caching subsystem.

Cache types

Cache types use the same cache service that is responsible for providing cache implementations. Digital Experience Manager now standardizes on the EHCache implementation, which can range from very simple setups all the way to distributed TerraCotta or BigMemory cache instances.

Digital Experience Manager uses multiple cache layers to optimize the performance of page delivery, including:

  • The browser cache
  • Front-end HTML caches
  • Object caches
  • Database caches

Each of these cache layers plays a different role in making sure values are only computed once.

The browser cache layer

While integrated in the browser rather than DX, the browser cache plays a critical role in guaranteeing good performance for the end-user. For example, DX's usage of the GWT framework makes it possible for AJAX source code to be aggressively cached in the browser cache. This ensures that unchanged script code is not reloaded. DX also properly manages the browser cache to make sure it doesn't cache page content that has changed. DX also controls expiration times for cached content so that the browser doesn't request content that is rarely changed.

The front-end HTML cache layer

Historically, DX has had many front-end HTML cache layer implementations. The first was the full-page HTML cache. While very efficient when a page was already available in the cache, it didn't degrade very well for pages that had a fragment of the HTML that changed from page to page, or from user to user (for example by displaying the user name on the page).

Digital Experience Manager 5 introduced the ESI cache server, which added the ability to cache fragments of HTML. This technology required a separate cache server that executed in a separate virtual machine to perform its magic. While much better than the full-page cache for dynamic page rendering, the ESI caching system suffered from problems with inter-server communication, which was very tricky to get to work efficiently. Also, integrating the ESI cache required good knowledge of the fragment-caching model when developing templates, which was an additional burden on integrators.

Digital Experience Manager 6 takes the best of both worlds, by combining the sheer efficiency of the embedded full-page cache with the fragment handling of the ESI cache server. This new cache implementation is called the "module cache" and integrates fragment caching at a module level, making the interaction with templates very natural. Template developers usually don't have to add any markup in order to have their fragments correctly cached. Even when they need to control the fragment generation, this is much easier to do than in previous versions of Digital Experience Manager. The "Skeleton Cache" is also an HTML front-end cache that basically caches everything "around" the fragments, and by regrouping both cache sub-systems we obtain the equivalent in terms of performance to the full-page HTML cache that existed in previous versions of Digital Experience Manager while retaining the flexibility of a fragment cache.

Object cache layer

The object cache layer is the next layer below the front-end HTML cache sub-systems. This layer handles some Java objects that cannot be optimally cached by the underlying layers. In previous versions of DX, this layer had a lot of different caches, but in the most recent versions it has been reduced to the strict minimum based on performance testing. It serves as a layer on top of the database caches in order to avoid reconstructing objects for each model request. This is all handled internally by Digital Experience Manager and it is only important to interact with these caches if integrators are directly calling back-end APIs that don't automatically update the caches. A good example of this is the LDAP user and group caches.

Database caches

The database cache layer makes sure that only minimal interaction with the database happens. This cache is important because database communication requires object (de-)serialization as well as network communication, so the overhead of database query execution may be quite substantial. The Hibernate ORM and Jackrabbit frameworks handle this layer transparently, so normally developers and integrators will not need to deal with it.

Clustering

Deploying Digital Experience Manager in a cluster is an effective way of distributing CPU and memory load to handle larger traffic sites. The image below illustrates a typical DX cluster installation. DX nodes communicate with each other through cache and database layers, but also access shared resources: a shared file system and the database. The file system is used for the binary content if the server is configured to store it there, or in the database if the default configuration is used. The database stores everything else. It is therefore very important to have a high-performance database installation, as Digital Experience Manager will depend on it to scale. Digital Experience Manager can also differentiate nodes in a cluster setup in order to offer more specialized processing. We will review here quickly the different node types.

Visitors nodes

Visitors nodes are specialized DX nodes that only serve as content publishing nodes. They also interact with portlets or application modules to render pages and input user generated content. This node specialization allows the separation of visitors load from authoring and background processing loads.

Authoring nodes

Authoring nodes are cluster nodes used to either browse or edit DX content. This is the most common type of node and multiple instances of authoring nodes distribute the load.

Processing node

The processing node executes long-running tasks, such as workflow validation operations, copy and pasting, content import and indexing, as background tasks. This enables other nodes to process content browsing and editing requests. This node is designed to be fault-tolerant. If the node fails during processing, it can simply be restarted and it will resume operations where it left off. Note that only one processing node is permitted

More resources on performance

As Digital Experience Manager constantly strives to improve on performance, make sure to check our website for additional resources on performance, as well as our "Configuration and Fine Tuning Guide" that contains best practices of deployment and configuration to properly setup Digital Experience Manager for high loads.