About caching and clustering

October 8, 2024

High performance on high-traffic web sites is often difficult to achieve. This section presents the technologies available in Jahia that help you handle large loads as well as scale out.

Caches

Caches enable high performing web systems such as Jahia to avoid recreating dynamic content under large system loads. Jahia uses a multi-layered caching subsystem.

Cache types

Cache types use the same cache service that is responsible for providing cache implementations. Jahia now standardizes on the EHCache implementation, which can range from very simple setups all the way to distributed TerraCotta or BigMemory cache instances.

Jahia uses multiple cache layers to optimize the performance of page delivery, including:

  • The browser cache
  • Front-end HTML caches
  • Object caches
  • Database caches

Each of these cache layers plays a different role in making sure values are only computed once.

The browser cache layer

While integrated in the browser rather than Jahia, the browser cache plays a critical role in guaranteeing good performance for the end-user. For example, Jahia's usage of the GWT framework makes it possible for AJAX source code to be aggressively cached in the browser cache. This ensures that unchanged script code is not reloaded. Jahia also properly manages the browser cache to make sure it doesn't cache page content that has changed. Jahia also controls expiration times for cached content so that the browser doesn't request content that is rarely changed.

The front-end HTML cache layer

Jahia combines the efficiency of embedded full-page caching with the fragment handling of the ESI cache server. This cache implementation, the Module Cache, integrates fragment caching at a module level, making the interaction with templates very natural. Template developers usually don't have to add markup to have their fragments correctly cached. Even when they need to control fragment generation, this is much easier to do than in previous versions of Jahia. The Skeleton Cache is also an HTML front-end cache that basically caches everything "around" the fragments, and by regrouping both cache sub-systems Jahia obtains the equivalent in terms of performance to the full-page HTML cache that existed in previous versions of Jahia while retaining the flexibility of a fragment cache.

Object cache layer

The object cache layer is the next layer below the front-end HTML cache sub-systems. This layer handles some Java objects that cannot be optimally cached by the underlying layers. In previous versions of Jahia, this layer had a lot of different caches, but in the most recent versions it has been reduced to the strict minimum based on performance testing. It serves as a layer on top of the database caches in order to avoid reconstructing objects for each model request. This is all handled internally by Jahia and it is only important to interact with these caches if integrators are directly calling back-end APIs that don't automatically update the caches. A good example of this is the LDAP user and group caches.

Database caches

The database cache layer makes sure that only minimal interaction with the database happens. This cache is important because database communication requires object (de-)serialization as well as network communication, so the overhead of database query execution may be quite substantial. The Hibernate ORM and Jackrabbit frameworks handle this layer transparently, so normally developers and integrators will not need to deal with it.

Clustering

Deploying Jahia in a cluster is an effective way of distributing CPU and memory load to handle larger traffic sites. The image below illustrates a typical Jahia cluster installation. Jahia nodes communicate with each other through cache and database layers, but also access shared resources: a shared file system and the database. The file system is used for the binary content if the server is configured to store it there, or in the database if the default configuration is used. The database stores everything else. It is therefore very important to have a high-performance database installation, as Jahia will depend on it to scale. Jahia can also differentiate nodes in a cluster setup in order to offer more specialized processing. We will review here quickly the different node types.

Visitors nodes

Visitors nodes are specialized Jahia nodes that only serve as content publishing nodes. They also interact with portlets or application modules to render pages and input user generated content. This node specialization allows the separation of visitors load from authoring and background processing loads.

Authoring nodes

Authoring nodes are cluster nodes used to either browse or edit Jahia content. This is the most common type of node and multiple instances of authoring nodes distribute the load.

Processing node

The processing node executes long-running tasks, such as workflow validation operations, copy and pasting, content import and indexing, as background tasks. This enables other nodes to process content browsing and editing requests. This node is designed to be fault-tolerant. If the node fails during processing, it can simply be restarted and it will resume operations where it left off. Note that only one processing node is permitted

More resources on performance

As Jahia constantly strives to improve on performance, make sure to check our website for additional resources on performance, as well as our "Configuration and Fine Tuning Guide" that contains best practices of deployment and configuration to properly setup Jahia for high loads.