Written by The Jahia Team
   Estimated reading time:

Augmented search provides a modern search experience built into the Jahia platform. At its core, Augmented Search is an extension to Jahia's GraphQL API providing a search experience backed by Elasticsearch, allowing users to search contents in your Jahia websites.

The Augmented Search module lets you use the power of Elasticsearch to index and search the contents of your Jahia websites. The module acts as connector to an existing Elasticsearch environment, by sending index data and search queries, and retrieving search results.

This module improves the relevance of the search results (compared to the default JCR search) as it includes a full-page search as opposed to a content-based search. It connects to an Elasticsearch cluster in version 7.X using the elasticsearch-connector-7 module.

Delegating the search capabilities to Elasticsearch has another advantage: Jahia consumes less resources, which improves the overall stability of the platform. You can also make most of the Elasticsearch scalability design. This was validated by our different performance tests, which showed significant improvement for edit operation response times. Smaller improvements regarding live browsing were also observed.

Capabilities

Augmented search offers:

  • Buffered operations. If the connection to Elasticsearch is lost, the operations to perform (contents to be indexed) are queued until the Elasticsearch connection is reestablished. In the current version of the module, the Jahia processing node manages the content indexing queue. This queue is stored in the RAM of the processing node. Therefore it is strongly advised to stop the processing node only when there are no operations left to perform.
  • Search while reindexing. It is possible to contribute content and search in the existing index while reindexing.
  • ACLs are supported at content level. The full-page search is only available when the searched terms appear in contents which share the same ACL as the page.
  • Visibility conditions on content is supported.
Note that any readable content can be found and listed as a result, even if one of the parent nodes is currently non-readable (e.g. due to visibility conditions, publication/unpublication, broken inheritance for roles): if a view to display the childnode on its own exists, then the user should be able to access the searched content. In case no view exist for the node, then the content will not be displayed on the page.

Search lifecycle

To best understand Augmented Search, it is key to properly identify the four main components of its lifecycle:

  1. Jahia Editors add content as they would usually do through Jahia. By publishing or updating content, an indexing job gets triggered by Jahia.
  2. The indexing job takes a newly modified content and pushes it to an Elasticsearch instance, according to Augmented-Search module configuration.
  3. Once the documents are indexed, a GraphQL API provides an interface to search through a potentially large number of documents hosted in your Elasticsearch instance. 
  4. A UI is then built to provider the interface between user interaction and Augmented Search GraphQL API.

Modules

The following modules are part of Augmented Search ecosystem and can be found on the Jahia Store:

  • Database Connector
    Enables Jahia's connection to 3rd party databases.
  • Elasticsearch Connector 7
    Based on the database connector, implements support for connection to an Elasticsearch cluster. 
  • Augmented Search
    Adds a GraphQL API endpoint and the backend logic for search
  • Augmented Search UI
    Aimed at being forked, this module is a sample search implementation that can be integrated in any Jahia page to display a search UI. The module also includes the search UI Jahia connector that bridges this UI with the Jahia GraphQL API.

All those modules are also available as a Jahia package: Augmented Search Enterprise Distribution

Logical architecture

The following diagram shows the architecture of augmented search. Augmented Search UI captures HTTPs search requests from visitor browsers and sends them to the Jahia GraphQL API. The API is responsible for search logic and sending requests to Elasticsearch. Connections between Elasticsearch and your databases are managed by Elasticsearch Connector 7 and Database Connector. Elasticsearch indexes and searches your data. 

In the diagram, blue is used to refer to Jahia modules.

augmented-search-schema.png

Custom search experience

Search experience can differ widely between implementations, we provide three different path to implement your search experience:

  • Use Augmented Search GraphQL API natively. This option provides the most flexiblity and allows any front-end logic to be implemented on top of our API. It is our recommended approach, but it is also the path that would result in the most effort. 
  • Built your own experience based on the existing Elastic's Search UI React library and our Jahia Search UI Connector
  • Use Augmented-Search UI. At the other end of the spectrum, this option involves a very minimal effort to get started. We provide a sample component that can be dropped into a page or forked and modified to your needs. 

Although we would recommend using Augmented Search GraphQL API natively, it is best to begin with Augmented-Search UI and progressively make your way into a more complex implementation as you discover the product and its capabilities.

Performance

In all our performance test scenarios, the Elasticsearch search provider easily met our acceptance criterias, as 90% of the requests took 3 times less than the limit we set, without any special tuning or configuration on the Elasticsearch environment side.

Elasticsearch as the search provider is more performant than JCR search in scenarios with a lot of searches combined with content contribution. However, in some other scenarios, with no contribution at all for instance, the JCR search scored slightly better than our Elasticsearch search provider.

For this reason, before using this module in production, it is strongly advised to carefully test your Elasticsearch environment to ensure that your expected performance levels are met.

Related links