Managing caching in Jahia

October 8, 2024

Introduction

In Jahia, the HTML Cache is implemented with rendering filters caching each module separately and then aggregating all the modules on rendering time to deliver the full page to the client.
As an example imagine a "last news module" that displays the latest 5 pieces news of a site. The aggregate on rendering will ask for the module last news for the rendering of each discrete news, those news will be cached separately and the last news module will only contains its own html and references to the displayed news. When subsequent rendering is asked the last news modules search for the expected news in the cache. If they are found it aggregates the content in the output, otherwise it asks for rendering only the missing part. This means that your news can be cached for hours but your last news request be cache for only 5- 10mn (depending on the frequency of updates on your site (no cache is never a good idea)), older new will be delivered from the cache and only the new one will fully rendered by the engine.

 

Cache process in Jahia

Figure 1 Example of page structure

Rendering scripts execution

The legacy cache implementation rendered all the fragments of the page before aggregating them and storing them in the cache. This implementation is great because the fragment aggregation was done at the end of the rendering pipeline, thus allowing to have a single page context for the whole process. It made things easier to save parameters in the page context and access them in other execution scripts.

Figure 2 Implementation flow

What cache framework is Jahia using?

We are using EHCache in its version 2.8.1.

You can configure it in the file WEB-INF/classes/ehcache-jahia.xml or WEB-INF/classes/ehcache-jahia-cluster.xml.

Invalidation or expiration?

Jahia is using both modes for its caches. By default a fragment is invalidated from the cache on the fly if the element is updated/deleted or a child node is added/removed. The expiration occurs after the idle time (timeToIdleSeconds) if the fragment has not been accessed or after the life time (timeToLiveSeconds).

You can configure it in the file WEB-INF/classes/ehcache-jahia-html.xml or WEB-INF/classes/ehcache-jahia-html-cluster.xml.

For more details, take a look at the official Ehcache documentation.

Overriding the default expiration?

You can override the default expiration in two ways.

The easiest and more end-user friendly method, is to allow the users to specify the expiration time directly from the end user interface.

User need specific permissions to access to this parameter.

To enable manual setup of expiration delay in the engines, you must apply the mixin type jmix:cache to the targeted object definition.

[jnt:lastNews] > jnt:content, jmix:list, mix:title, jmix:queryContent, jmix:cache
 - maxNews (long) = 10 indexed=no
 - filter (reference, category[autoSelectParent=false])

You can also have a hardcoded expiration on a per template basis in a template properties file.

Example you can create the jnt_banner/html/banner.properties file in default module to make banner cached only 30s.

#Make banner non cacheable
cache.expiration = 30

Expiration delays are expressed in seconds. Note that if you have an alternative view on your content, you can specify a properties file for this view that will override the default ones if present. For instance, the jnt_user/html/user.welcome.properties file in default module overrides the cache for the "welcome" view of the user module.

Automatic and manual management of dependency for an element

Dependencies of an html element define for which nodes updates this element should be flushed. The system tries to handle most of the dependencies by itself. Automatically the system detects implicit dependencies like parent/childs.

If an existing child is updated, only his html fragment will be flushed from the cache. If we create or delete a new child, the system will also flush the parent html fragment. So the system handles automatically all standard parent/child relations.

Now if you have some bound components in your page, the system handles it automatically by making those elements dependent of the bound component for the key computing.

The system will also parse your html to find all the links you have in your module html to other nodes (useful for rich text where your editors will have entered links to pages or contents you couldn't know in advance) and define the corresponding dependencies.

This parsing is executed by the CacheUrlDependenciesParserFilter.

So if in your templates you have defined a template for a news object that add a rateable module bound to this news, then the cache will reflect that by caching the rendering of you rateable module per main resource and adding a dependency to the news.

This way we avoid to display the same rateable module for all news, but we have one per news.

You can have some of those dependencies set using the template properties file.

You can also define directly in your script file (jsps, etc.) the dependencies you want to add to your fragment. As an example, we can look at the comments component that you can bind to any object in Jahia. This comments components works in two parts.

First part is the display of the form to add a comment, second part is the display of the comments list. This form on first submission will create a subfolder under the main resource called comments.

So, on the creation of the first comment the display list will be correctly flushed as the system as automatically created a dependency between the fragment and the main resource.

By adding a child under the main resource (comments node) we will flush all html fragments having a dependency to the main resource.

But for subsequent submission of new comments, we do not update anymore the main resource but only the subnode comments, so in our script we have to tell the system to flush this html fragment when the main resource subnode comments is updated.

<jcr:node var="comments" path="${boundComponent.path}/comments"/>
<c:if test="${not empty comments}">
    <template:addCacheDependency node="${comments}"/>
    <template:module node="${comments}"/>
</c:if>

Here what you have to keep in mind is that if your script loads another node than the current node or the bound one then you will have to add a dependency manually. You can also define dependencies based on some regular expression, this is really useful for html fragment that are using search queries to display content. As a rule, if you are using a query and that query have a constraint on descendant nodes or children node then you should have a regexp dependency on that path Here an example from the blogs application.

<query:definition var="result"
                  statement="select * from [jnt:blogPost] as blogPost where isdescendantnode(blogPost, ['${renderContext.mainResource.node.path}']) order by blogPost.[jcr:lastModified] desc"
                  limit="20"/>
<template:addCacheDependency
        flushOnPathMatchingRegexp="\\\\Q${renderContext.mainResource.node.path}\\\\E/.*/comments/.*"/>

This fragment will be flushed for any change on any nodes down to two sub level of the main resource. The \\\\Q and \\\\E are here to define an escape sequence so that whatever the path value it will be interpreted literally (This should be put in all your regexp encapsulating an unknown path).

Fragment key generation

Originally called cache keys, they are more fragment keys now. Because they are used by the Aggregate filter to identify fragments.

Type of keys

  • Fragment key: This is the identity of the fragment without the context.
  • Fragment final key: The complete identity of a fragment, the final key is the result of (Fragment key + the current context of rendering). For example the same fragment key can result in multiple different final keys depending on the context of the render request. (users logged or not, parameters, etc.)

Cache key part generator

These classes are used to add new entries to fragment keys, and fragment final keys. In order to do that you can implement CacheKeyPartGenerator.

It contains two methods:

The first method isgetValue(): used to generate the key only when the parent fragment is not in cache. Parent fragment html contains the sub fragment key, we will try to find the fragment using this key before calling the getValue() to rebuild it.

To summarize:

  • build the fragment key
  • only when parent fragment is not in cache
  • support heavy operations like reading nodes from JCR

The second method isreplacePlaceHolders(): used to build the final key. When the fragment key is retrieved from parent fragment or have already been constructed. This method is called just after to identify the fragment based on the current context of execution.

This method is used to replace the previous value in the key, by something that could potentially differentiate the fragment based on the context, even if the initial fragment key is the same.

To summarize:

  • Build the fragment final key
  • Always called in all the case
  • Do not support heavy operations, avoid JCR read here.

Example of best practice for CacheKeyPartGenerator implementation:

public class ContextCacheKeyPartGenerator implements CacheKeyPartGenerator {
    @Override
    public String getKey() {
        return "context";
    }

    @Override
    public String getValue(Resource resource, RenderContext renderContext, Properties properties) {
        // read the node to detect if the resource need to be contextual
        // will be call only one time
        if (resource.getNode().isNodeType("jnt:contextualizedNode")) {
            return "contextual";
        }
        return "notContextual";
    }

    @Override
    public String replacePlaceholders(RenderContext renderContext, String keyPart) {
        // apply the contextual changes in the final key to differentiate the fragments
        if("contextual".equals(keyPart)) {
            return renderContext.isLoggedIn() ? "logged" : "notLogged";
        }
        return keyPart;
    }
}

In the previous example the node of type "jnt:contextualizedNode" will have 2 possible different final key depending on the context (user logged or not)

So the best practice for the CacheKeyPartGenerator implementation is:

  • avoid JCR read in replacePlaceHolders (because call every time)
  • use this two methods to decouple the logic of your cache key part generators ( what is contextual ? what is not contextual ? do you need to read nodes ? )

Example of implementations:

The core defines different key parts like "workspace", "language", "node path", "template", "templateType", "acls", "queryString", which are usually enough to create a unique key for each fragment. The default generators are defined in the file applicationcontext-cache.xml. 

<bean id="cacheKeyGenerator" class="org.jahia.services.render.filter.cache.DefaultCacheKeyGenerator">
      <property name="partGenerators">
          <list>
              <bean class="org.jahia.services.render.filter.cache.LanguageCacheKeyPartGenerator"/>
              <bean class="org.jahia.services.render.filter.cache.PathCacheKeyPartGenerator"/>
              <bean class="org.jahia.services.render.filter.cache.TemplateCacheKeyPartGenerator"/>
              <bean class="org.jahia.services.render.filter.cache.TemplateTypeCacheKeyPartGenerator"/>
              <ref bean="${org.jahia.aclCacheKeyPartGenerator.implementation:aclCacheKeyPartGenerator}"/>
              <bean class="org.jahia.services.render.filter.cache.ContextCacheKeyPartGenerator"/>
              <bean class="org.jahia.services.render.filter.cache.WrappedCacheKeyPartGenerator"/>
              <bean class="org.jahia.services.render.filter.cache.CustomCacheKeyPartGenerator"/>
              <bean class="org.jahia.services.render.filter.cache.QueryStringCacheKeyPartGenerator"/>
              <bean class="org.jahia.services.render.filter.cache.TemplateNodesCacheKeyPartGenerator"/>
              <bean class="org.jahia.services.render.filter.cache.ResourceIDCacheKeyPartGenerator"/>
              <bean class="org.jahia.services.render.filter.cache.InAreaCacheKeyPartGenerator"/>
              <bean class="org.jahia.services.render.filter.cache.SiteCacheKeyPartGenerator"/>
              <bean class="org.jahia.services.render.filter.cache.ModuleParamsCacheKeyPartGenerator"/>
              <bean class="org.jahia.services.render.filter.cache.AjaxCacheKeyPartGenerator"/>
              <ref bean="areaResourceCacheKeyPartGenerator"/>
          </list>
      </property>
</bean>


However, if you need to customize the key by adding specific values, a spring bean implementing CacheKeyPartGenerator can be added to any module, and will impact all keys generated for the cache.

Each key that is generated looks like that

en@@/sites/ACMESPACE/contents/projects-news/news_36- 1@@medium@@html@@privileged%2Csiteadministrator:%2Fsites%2FACMESPACE|@@module@@false@@@@{}@@@@998a823d-f275-4a10- aef4-0ae9d1ea5677@@@@ACMESPACE:null@@{}@@

Each parameter of the key is separated by @@ so be sure to avoid this symbols inside your values, empty part is possible.

Keys are generated during the prepare and execute phases of the AggregateFilter and should give the same result on each call, otherwise Jahia will log a warning like Key generation does not give the same result after execution… with the involved keys. This issue can lead to an overhead in cache generation as fragments might not be found in the cache.

Here is an example of a very simple part, the LanguageCacheKeyPartGenerator:

public class LanguageCacheKeyPartGenerator implements CacheKeyPartGenerator {
    @Override
    public String getKey() {
        return "language";
    }
    @Override
    public String getValue(Resource resource, RenderContext renderContext, Properties properties) {
        return resource.getLocale().toString();
    }
    @Override
    public String replacePlaceholders(RenderContext renderContext, String keyPart) {
        return keyPart;
    }
}

As you can see this one returns the locale of the current resource as a String when the key is generated. And the "replacePlaceHolders" is just returning the same value, because there is nothing else to do. We could use the replacePlaceHolders() to resolve the language, but we prefer to use the getValue() here because the language is the only information to resolved and mainly because if a parent resource is in language "EN" the child resource will also be in "EN". The "replacePlaceHolders" is called in all the case, so here we use the "getValue" because this one is only called when the key is built for the given resource, this way we reduce the code executed when fragments are in cache. Another example is the path part of the key that can take into account the main resource requested by the user:

public class PathCacheKeyPartGenerator implements CacheKeyPartGenerator {
    public static final String MAIN_RESOURCE_KEY = "_mr_";

    @Override
    public String getKey() {
        return "path";
    }

    @Override
    public String getValue(Resource resource, RenderContext renderContext, Properties properties) {
        StringBuilder s = new StringBuilder(resource.getNode().getPath());
        if ("true".equals(properties.getProperty("cache.mainResource"))) {
            s.append(MAIN_RESOURCE_KEY);
        }
        return s.toString();
    }

    public String getPath(String key) {
        return StringUtils.replace(key, MAIN_RESOURCE_KEY, "");
    }

    @Override
    public String replacePlaceholders(RenderContext renderContext, String keyPart) {
        return StringUtils.replace(keyPart, MAIN_RESOURCE_KEY,
                renderContext.getMainResource().getNode().getCanonicalPath() + renderContext.getMainResource().getResolvedTemplate());
    }
}

In this one, we use the token "_mr_" returned by the getValue() and replaced by the replacePlaceHolders(). A same node can be displayed in different context, with different main resource, that's why this operation is done in the replacePlaceHolders(). We also read the properties in the getValue() to know if the current fragment have the properties cache.mainresource.

ACLS in cache keys

Upon a request if content is not cached we search for all the ACLs that can apply for the user directly. Then we look up all the ACLs of the groups in its membership list. We use all this to build a map of ACLs per path as ACLs are applied for a path.

Groups and Users ACLs are stored in local cache that are flushed when an ACL is updated on the platform.

Custom elements in keys

The custom cache key part allows you putting some elements in the request in an attribute named module.cache.additional.key. This element has to be present in the request in the prepare phase of the AggregateFilter so it means it has to be set before (higher priority filter, like the ChannelFilter for example, which is using that mechanism to switch cache between different channels). The value of the custom parts are returned as is in the replacePlaceholders phase.

Implementation specificities

JCR Node read before Cache filter

Render filters that have priorities inferior to the Cache filter are not executed if the fragment is in cache. But this may not be the case for the other implementations. You should be careful about JCR Read operations in Render Filters with a priority lower than 16.

Known limitations

The current implementation has one limitation which is that a parent fragment is put in cache only when the sub fragments are rendered, and cached.

As an example, imagine a user (user 1) is requesting a page of the site, this page is using a template named "2 columns". 2 seconds after another user (user 2) is requesting another page of the site, this page is using the same template ("2 columns"). For some reason the page requested by user 1 is taking 2min to be displayed, because this page contains somewhere one fragment doing an external call to an external API and there is some connection issue.

The effect of this is that user 2 is blocked during 2 min, because user 1 is generating the template fragment. and the template fragment will be put in cache only when all the sub fragments of the page are rendered and cached.

image010.png