About caching and clustering

  Written by The Jahia Team
   Estimated reading time:

High performance on high-traffic web sites is often tricky to achieve. In this section we will present the technologies available in Digital Experience Manager that will help you handle large loads as well as scale out.

Caches

Caches are essential to high performing web systems such as Digital Experience Manager to be able to avoid recreating dynamic content under large system loads. Digital Experience Manager uses a multi-layered caching subsystem.

Cache types

The cache types all use the same cache service that is responsible for providing cache implementations. Digital Experience Manager now standardizes on the EHCache (http://ehcache.org/) implementation, which can range from very simple setups all the way to distributed TerraCotta (http://www.terracotta.org/) or BigMemory (http://www.terracotta.org/bigmemory) cache instances.

Digital Experience Manager uses multiple cache layers to optimize the performance of page delivery:

  • the browser cache
  • front-end HTML caches
  • object caches
  • database caches

Each of these cache layers plays a different role in making sure values are only computed once.

The browser cache layer

While not integrated in Digital Experience Manager but in the browser, the browser cache plays a critical role in guaranteeing good performance for the end-user. For example, Digital Experience Manager's usage of the GWT framework makes it possible for AJAX source code to be aggressively cached in the browser cache, therefore making sure we don't reload script code that hasn't changed. Digital Experience Manager also properly manages the browser cache to make sure it doesn't cache page content that has changed. It also controls expiration times for cached content, so that the browser doesn't request content that is rarely changed.

The front-end HTML cache layer

Historically, Digital Experience Manager has had many front-end HTML cache layer implementations. The first was the full-page HTML cache. While very efficient when a page was already available in the cache, it didn't degrade very well for pages that had a fragment of the HTML that changed from page to page, or from user to user (for example by displaying the user name on the page). In Digital Experience Manager 5 we introduced the ESI cache server, which added the ability to cache fragments of HTML. This technology required a separate cache server that executed in a separate virtual machine to perform its magic. While much better than the full-page cache for dynamic page rendering, the ESI caching system suffered from problems with inter-server communication, which was very tricky to get to work efficiently. Also, integrating the ESI cache required good knowledge of the fragment-caching model when developing templates, which was an additional burden on integrators. Digital Experience Manager 6 takes the best of both worlds, by combining the sheer efficiency of the embedded full-page cache with the fragment handling of the ESI cache server. This new cache implementation is called the "module cache" and integrates fragment caching at a module level, making the interaction with templates very natural. Template developers usually don't have to add any markup in order to have their fragments correctly cached. Even when they need to control the fragment generation, this is much easier to do than in previous versions of Digital Experience Manager. The "Skeleton Cache" is also an HTML front-end cache that basically caches everything "around" the fragments, and by regrouping both cache sub-systems we obtain the equivalent in terms of performance to the full-page HTML cache that existed in previous versions of Digital Experience Manager while retaining the flexibility of a fragment cache.

Object cache layer

The next layer below the front-end HTML cache sub-systems is the object cache layer. This layer handles some Java objects that cannot be optimally cached by the underlying layers. In previous versions of Digital Experience Manager this layer had a lot of different caches, but in the most recent versions it has been reduced to the strict minimum based on performance testing. It serves as a layer on top of the database caches in order to avoid reconstructing objects for each model request. This is all handled internally by Digital Experience Manager and it is only important to interact with these caches if integrators are directly calling back-end APIs that don't automatically update the caches (a good example of this are the LDAP user and group caches).

Database caches

The last layer of caches is the database cache layer that makes sure that only minimal interaction with the database happens. This cache is important because database communication requires object (de-) serialization as well as network communication, so the overhead of database query execution may be quite substantial. The Hibernate ORM and Jackrabbit frameworks handle this layer transparently, so normally developers and integrators will not need to deal with it.

Clustering

Deploying Digital Experience Manager in a cluster is a very powerful way of distributing CPU and memory load to handle larger traffic sites. A typical Digital Experience Manager cluster installation is illustrated in the above graph. Digital Experience Manager nodes communicate with each other through cache and database layers, but also access shared resources: a shared file system and the database. The file system is used for the binary content if the server is configured to store it there, or in the database if the default configuration is used. The database stores everything else. It is therefore very important to have a high-performance database installation, as Digital Experience Manager will depend on it to scale. Digital Experience Manager can also differentiate nodes in a cluster setup in order to offer more specialized processing. We will review here quickly the different node types.

Visitors nodes

Digital Experience Manager "visitors" nodes are specialized Digital Experience Manager nodes that only serve as content publishing nodes. They also interact with portlets or application modules to render pages and input user generated content. Using this node specialization allows the separation of visitors load from authoring and background processing loads.

Authoring nodes

Digital Experience Manager "authoring" nodes are cluster nodes that can be used to either browse or edit Digital Experience Manager content. This is the most common usage of Digital Experience Manager nodes, and therefore it is interesting to have multiple instances of these nodes in order to distribute the load.

Processing node

In Digital Experience Manager, long-running tasks such as workflow validation operations, copy & pasting, content import and indexing are executed as background tasks, and only executed on the processing node. This way, while these long operations are executed, other nodes are still able to process content browsing and editing requests. Note that for the moment it is only allowed to have one processing node. This node is designed to be fault-tolerant, so in case it fails during processing, it can simply be restarted and it will resume operations where it left off.

More resources on performance

As Digital Experience Manager constantly strives to improve on performance, make sure to check our website for additional resources on performance, as well as our "Configuration and Fine Tuning Guide" that contains best practices of deployment and configuration to properly setup Digital Experience Manager for high loads.