The Elasticsearch search provider modules make it possible to use the power of Elasticsearch to index and search contents in your Digital Experience Manager websites. They act as connector to an existing Elasticsearch environment, by sending the index data and search queries, and retrieving search results.
The Elasticsearch search provider module improves the relevance of the search results (compared to the default JCR search) as it includes a full-page search as opposed to a content-based search. Elasticsearch 5.6.3 is also based on a more recent version of Lucene (6.6.1) than the one embedded in the core of Digital Experience Manager.
Delegating the search capabilities to Elasticsearch has another advantage: Digital Experience Manager then consumes less resources, which improves the overall stability of the platform. You can also make most of the Elasticsearch scalability design.This was validated by our different performance tests, which showed significant improvement for edit operation response times. Smaller improvements regarding live browsing were also observed.
In all our performance test scenarios, the Elasticsearch search provider easily met our acceptance criterias, as 90% of the requests took 3 times less time than the limit we fixed, without any special tuning / configuration on the Elasticsearch environment side. The use of Elasticsearch as the search provider is also more performant than JCR search in scenarios with a lot of searches combined with content contribution. However, in some other scenarios, with no contribution at all for instance, the JCR search scored slightly better than our Elasticsearch search provider. For this reason, before using this module in production, it is strongly advised to carefully test your Elasticsearch environment to ensure that your expected performance levels are met.
The Elasticsearch search provider module currently supports Elasticsearch 5.6.3, available here.
If your Elasticsearch cluster is using X-Pack, please consult our dedicated page related to X-Pack configuration.
The required modules can be deployed on your Digital Experience Manager environment by using the Elasticsearch search provider package available on our store, or by installing its modules individually:
In the DX administration UI, go into Configuration -> Database Connector
Create a new connection, by clicking on the “New connection” button:
Select the “elastic” database type:
Create a new Elasticsearch connection by filling the following settings:
Then the connection is created:
In the administration, go to Configuration > Elasticsearch management
Verify that the Elasticsearch database connection ID corresponds to the one previously created (in our example "esConnection"), and click on save: this will trigger a re-indexing of the platform.
It is possible to start a re-indexing from the same screen: a job will perform the re-indexing in background.
In order to enable the Elasticsearch search provider, go to Configuration > Search settings, and select “Elasticsearch search provider”, then save:
Like for any search provider, default (or custom) views for search results can be used with the Elasticsearch search provider.
Upon installation of the search-provider-elasticsearch module, a configuration file named org.jahia.services.search.provider.elasticsearch.cfg
is created under digital-factory-data/karaf/etc/ . This configuration file is used to specify the types of contents that need to be indexed in Elasticsearch. Whenever you modify this file in a DX cluster environment, the synchronization does not happen automatically, so you have to copy the modified file to each cluster node.
The indexing and the way to return results with Elasticsearch are done differently from the default JCR search provider. The JCR search provider indexes each content individually, then an aggregation is performed when collecting the search results. On the other side, the Eleasticsearch search provider already aggregates in the index contents which are displayable in full page: pages (jnt:page) and contents which come with a content template (e.g. jnt:news).
By default, only pages are indexed.
org.jahia.services.search.provider.elasticsearch.content.indexedMainResourceTypes
defines the list of content types that can be indexed as full page contents. These content types need to have corresponding content templates in order to be displayed individually, like "pages". Only content templates without restriction (mode / user / permissions), which are set in the studio, can be used to index content.
Only fulltext and metadata are indexed by default. If you want to do a search on a specific property you have to list it in:
org.jahia.services.search.provider.elasticsearch.content.mappedNodeTypes
By default, only the contents which have the jmix:editorialContent as supertype are indexed as subnodes of main resource (pages or contents with a content template).
Other content types can be indexed by listing them in:
org.jahia.services.search.provider.elasticsearch.content.indexedSubNodeTypes
The implementation of the Elasticsearch search provider differs from the implementation of the JCR search provider. Therefore, the search results may differ from a search provider to another. This section lists the differences.
The JCR search provider accesses the Lucene index created by Jackrabbit, where each DX cluster node has its own index. Therefore the index is always up-to-date and synchronously includes the content changes processed on the current cluster node and obtains in near real time changes from content write operations processsed on other DX nodes in the cluster.
With ES search provider the index operations are always sent to ElasticSearch exclusively through the DX processing node. So if the DX processing node is down, while content is being modified, the index is not up-to-date with these latest content changes. On DX processing node startup it will catch up and send to Elasticsearch all indexing requests for changes done during its absence. Notice also that even if the DX processing node is up and running, the Elasticsearch indexing is always asynchronous and thus just near real time.
Using hit.rawHit
in a JSP or hit.getRawHit()
in Java code, may not be compatible with the Elasticsearch provider as the object returned is not the JCRNode
, but a SearchHit
object from Elasticsearch. We tried to avoid loading JCR nodes for each result due to performance reasons. If you really need that object, you have to load it by path, which can be obtained from the SearchHit
object by calling getField(ESConstants.NODE_PATH_KEY)
.
The configuration in indexing-configuration.xml
is Jackrabbit specific and thus not considered by the Elasticsearch provider. If you use index rules to boost fields under certain conditions, or exclude certain nodes/sections from being indexed or override analyzers for specific fields (without the need to modify the CND file), then this configuration will not be used by Elasticsearch, but we will come up with alternatives in future releases of the module.
A displayable content (which has a content template) created inside a page, will generate two results with ES search provider: a link to the page where the content was created, and a link to the full page view of the content. The JCR search provider only displays the link to the full page view of the content.
The ES search providers uses a different analyzer for file names than the JCR search provider. This means that the different elements of a file name are individually indexed in ES, and an exact search on these elements is performed. The following example illustrates this difference.
3 files are available on a site:
Searched word(s) | JCR Result | Elasticsearch result |
car |
allBlueCars.zip |
blueCar.png |
blue car | allBlueCars.zip blueCar.png |
blueCar.png |
cars | allBlueCars.zip carsListWikipedia.txt |
allBlueCars.zip carsListWikipedia.txt |
Files and contents can be found as references when searching for contents only. The content or file found is displayed as a search result, and the links to the pages where the content appear are displayed in the "Appears in" field.
In the current version (2.1.x) content references are part of the page / displayable content that contains it, meaning that the content is found as if it would be part of the page.
File usages in search results are currently not supported.
The ES search provider evaluates each date as follows:
Date: 10.11.2017 -> 10.11.2017 00:00
This means that in order to search for all the contents created on the 10.11.2017 you will need to use the following date range: from 10.11.2017 to 11.11.2017
The ES and JCR search providers may not display the same last modification date in the search results (if this information is part of the search result view): ES uses the last modification date of the page (which is shown as the actual search result, therefore the information is relevant), whereas the JCR search shows the date corresponding to the content found.
For instance, a page is returned as it contains a richtext matching the search criteria: