Augmented Search overview and architecture

November 14, 2023
We recommend using Augmented Search on Jahia Cloud. However, it can be used on-premises under the conditions described in the Augmented Search On-Premises page.

Augmented Search provides a modern search experience built into the Jahia platform. At its core, Augmented Search is an extension to Jahia's GraphQL API that provides a search experience backed by Elasticsearch, allowing users to search content in your Jahia websites.

The Augmented Search module lets you use the power of Elasticsearch to index and search the contents of your Jahia websites. The module acts as connector to an existing Elasticsearch environment, by sending index data and search queries, and retrieving search results.

This module improves the relevance of the search results (compared to the default JCR search) as it includes a full-page search rather than a content-based search. It connects to an Elasticsearch server using the elasticsearch-connector module.

Delegating the search capabilities to Elasticsearch has another advantage: Jahia consumes less resources, which improves the overall stability of the platform. You can also make most of the Elasticsearch scalability design. This was validated by our different performance tests, which showed significant improvement for edit operation response times. Smaller improvements regarding live browsing were also observed.

Capabilities

Augmented Search offers:

  • Buffered operations. If the connection to Elasticsearch is lost, the operations to perform (contents to be indexed) are queued until the Elasticsearch connection is reestablished. In the current version of the module, the Jahia processing node manages the content indexing queue. This queue is stored in the RAM of the processing node. Therefore it is strongly advised to stop the processing node only when there are no operations left to perform.
  • Search while reindexing. It is possible to contribute content and search in the existing index while reindexing.
  • ACL support at the content level. The full-page search is only available when the searched terms appear in content which share the same ACL as the page.
  • Support for visibility conditions on content.
Note: Any readable content can be found and listed as a result, even if one of the parent nodes is currently non-readable (for example, due to visibility conditions, publication/unpublication, broken inheritance for roles). If a view to display the childnode on its own exists, then the user should be able to access the searched content. In case no view exist for the node, then the content will not display on the page.

Search lifecycle

To best understand Augmented Search, you should be familiar with the main parts of its lifecycle:

  1. Jahia Editors add content in Jahia. Publishing or updating content triggers an indexing job by Jahia.
  2. The indexing job takes newly modified content and pushes it to an Elasticsearch instance, according to Augmented Search module configuration.
  3. Once the documents are indexed, a GraphQL API provides an interface to search through a potentially large number of documents hosted in your Elasticsearch instance. 
  4. A UI is then built to provide the interface between users and the Augmented Search GraphQL API.

Modules

The following modules are part of Augmented Search ecosystem and can be found on the Jahia Store:

  • Database Connector
    Enables Jahia's connection to 3rd party databases.
  • Elasticsearch Connector
    Based on the database connector, implements support for connection to an Elasticsearch cluster. 
  • Augmented Search
    Adds a GraphQL API endpoint and the backend logic for search
  • Augmented Search UI
    Aimed at being forked, this module is a sample search implementation that can be integrated in any Jahia page to display a search UI. The module also includes the search UI Jahia connector that bridges this UI with the Jahia GraphQL API.

All those modules are also available as the Augmented Search Enterprise Distribution Jahia package.

Logical architecture

The following diagram shows the architecture of Augmented Search. The Augmented Search UI captures HTTPs search requests from visitor browsers and sends them to the Jahia GraphQL API. The API is responsible for search logic and sending requests to Elasticsearch. Connections between Elasticsearch and your databases are managed by Elasticsearch Connector 7 and Database Connector. Elasticsearch indexes and searches your data. 

In the diagram, blue is used to refer to Jahia modules.

Augmented-Search-Architecture-overview.png

Custom search experience

The search experience can differ widely between implementations. Jahia provides three different path to implement your search experience:

  • Use Augmented Search GraphQL API natively. This option provides the most flexibility and allows any front-end logic to be implemented on top of our API. It is our recommended approach, but it is also the path that would result in the most effort. 
  • Built your own experience based on the existing Elastic's Search UI React library and our Jahia Search UI Connector
  • Use Augmented-Search UI. This option involves a very minimal effort to get started. We provide a sample component that can be dropped into a page or forked and modified to suit your needs. 

Although we would recommend using Augmented Search GraphQL API natively, it is best to begin with Augmented-Search UI and progressively make your way into a more complex implementation as you discover the product and its capabilities.

Performance

In all our performance test scenarios, the Elasticsearch search provider easily met our acceptance criterias as 90% of the requests took 3 times less than the limit we set without any special tuning or configuration on the Elasticsearch environment side.

Elasticsearch as the search provider is more performant than JCR search in scenarios with many searches combined with content contribution. However, in some other scenarios, with no contribution at all for instance, the JCR search scored slightly better than our Elasticsearch search provider. For this reason, before using this module in production, it is strongly advised that you carefully test your Elasticsearch environment to ensure that your expected performance levels are met.

API First

Starting with version 3.0, Augmented Search was implemented with an API first approach. This means any of its actions (whether for searching or administering Augmented Search) can be performed via a well documented GraphQL API.

Since GraphQL was designed with support for introspection, this documentation will provide high-level pointers and elements needing particular attention, but will not detail every single parameter since those are directly documented in our GraphQL schema. You can access the schema online through Jahia administration, by clicking on Tools>GraphQL>Docs in the GraphQL playground.

Augmented Search was built with two entry nodes, separating two main use cases, searching and administration.

Search

A search entry point, available directly at the API's root, provides access to all of Augmented Search user-facing features (for example results, facets, and more). For example, this query fetches displayableName values for all results matching the "jahia" search string (and using default values for pagination).


query {
  search (q: "jahia"){
    results {
      hits {
        displayableName
      }
    }
  }
}

 

Administration

A search administration entry point, available under the admin node, provides access to all of Augmented Search administration features. For example, this query adds a site to Augmented Search. Most of these features are also directly available through Jahia Administration.


mutation {
  admin {
    search {
      addSite(siteKey: "digitall")
    }
  }
}

 

Related links