Installing and configuring augmented search

  Written by The Jahia Team
 
Developers
   Estimated reading time:

The Augmented Search module lets you use the power of Elasticsearch to index and search the contents of your Jahia websites. The module acts as connector to an existing Elasticsearch environment, by sending index data and search queries, and retrieving search results.

This module improves the relevance of the search results (compared to the default JCR search) as it includes a full-page search as opposed to a content-based search. It connects to an Elasticsearch cluster in version 7.X using the elasticsearch-connector-7 module.

Delegating the search capabilities to Elasticsearch has another advantage: Jahia consumes less resources, which improves the overall stability of the platform. You can also make most of the Elasticsearch scalability design. This was validated by our different performance tests, which showed significant improvement for edit operation response times. Smaller improvements regarding live browsing were also observed.

In all our performance test scenarios, the Elasticsearch search provider easily met our acceptance criterias, as 90% of the requests took 3 times less than the limit we set, without any special tuning or configuration on the Elasticsearch environment side. Elasticsearch as the search provider is more performant than JCR search in scenarios with a lot of searches combined with content contribution. However, in some other scenarios, with no contribution at all for instance, the JCR search scored slightly better than our Elasticsearch search provider. For this reason, before using this module in production, it is strongly advised to carefully test your Elasticsearch environment to ensure that your expected performance levels are met.

Capabilities

Augmented search offers:

  • Buffered operations. If the connection to Elasticsearch is lost, the operations to perform (contents to be indexed) are queued until the Elasticsearch connection is reestablished. In the current version of the module, the Jahia processing node manages the content indexing queue. This queue is stored in the RAM of the processing node. Therefore it is strongly advised to stop the processing node only when there are no operations left to perform.
  • Search while reindexing. It is possible to contribute content and search in the existing index while reindexing.
  • ACLs are supported at content level. The full-page search is only available when the searched terms appear in contents which share the same ACL as the page.
  • Visibility conditions on content is supported.
Note that any readable content can be found and listed as a result, even if one of the parent nodes is currently non-readable (e.g. due to visibility conditions, publication/unpublication, broken inheritance for roles): if a view to display the childnode on its own exists, then the user should be able to access the searched content. In case no view exist for the node, then the content will not be displayed on the page.

Installation requirements

Elasticsearch 

The Elasticsearch search provider module currently supports Elasticsearch 7.4, available here. If your Elasticsearch cluster is using X-Pack, please consult our dedicated page related to X-Pack configuration.

Jahia modules

The required modules can be deployed on your Jahia environment by using the Augmented Search - Distribution package available on our store, or by installing its modules individually:

  • Database connector
  • Elasticsearch connector 7
  • Augmented Search
  • Augmented Search UI

Setup

Elasticsearch connection

In the Jahia administration UI, go into Configuration>Database Connector
Create a new connection, by clicking on the New connection button.

ES-config-1.PNG

Select the elastic database type.

ES-config-2.PNG

 

Create a new Elasticsearch connection by filling the following settings:

  • Host
    The IP/hostname of your Elasticsearch server
  • Port
    The port used by your Elasticsearch server
  • Id
    The name of the Elasticsearch connection you are creating
  • Cluster Name
    Your Elasticsearch cluster.name property

If using X-Pack, open the Advanced tab, select Use XPack Security and enter a username and password (the default values are elastic/changeme).

ES-config-3.PNG

Then the connection is created.

ES-config-4.PNG

Elasticsearch setup

In Administration, go to Configuration>Elasticsearch management.

Verify that the Elasticsearch database connection ID corresponds to the one previously created (in our example "esConnection"), and click Save. This will trigger a reindexing of the platform.

ES-config-5.PNG

It is possible to start reindexing from the same screen. A job will perform the reindexing in background.

Configuration

Configuration file

Upon installation of the search-provider-elasticsearch module, a configuration file named org.jahia.modules.augmentedsearch.cfg is created under digital-factory-data/karaf/etc/ . This configuration file is used to specify the types of contents that need to be indexed in Elasticsearch. Whenever you modify this file in a Jahia cluster environment, the synchronization does not happen automatically, so you have to copy the modified file to each cluster node.

Main Resource types

The indexing and the way to return results with Elasticsearch are done differently from the default JCR search provider. The JCR search provider indexes each content individually, then an aggregation is performed when collecting the search results. On the other side, the Eleasticsearch search provider already aggregates in the index contents which are displayable in full page: pages (jnt:page) and contents which come with a content template (e.g. jnt:news).

By default, only pages are indexed.

org.jahia.services.search.provider.elasticsearch.content.indexedMainResourceTypes defines the list of content types that can be indexed as full page contents. These content types need to have corresponding content templates in order to be displayed individually, like "pages". Only content templates without restriction (mode / user / permissions), which are set in the studio, can be used to index content.

Mapped nodetypes

Only fulltext and metadata are indexed by default. If you want to do a search on a specific property you have to list it in:

org.jahia.services.search.provider.elasticsearch.content.mappedNodeTypes

Indexing of contents

By default, only the contents which have the jmix:editorialContent as supertype are indexed as subnodes of main resource (pages or contents with a content template). 

Other content types can be indexed by listing them in:

org.jahia.services.search.provider.elasticsearch.content.indexedSubNodeTypes

Other configurations

  • Language analyzer: defines the analyzer used for a specific language
  • Buffering configuration: this property defines the strategy and timing to use when Jahia checks for the Elasticsearch connection after ES has become unreachable
  • Reindexing requests batch size: you can set the number of requests sent at a time while reindexing

Related links