Written by The Jahia Team
 
Developers
   Estimated reading time:

Augmented Search submits documents to be indexed to an Elasticsearch cluster and allows users to search on those indexed documents. During the indexing process, Augmented Search aggregates a page and its subnodes, then extracts their read permissions and stores those alongside the indexed documents. When receiving a search query, Augmented Search filters out results to only return those matching the read permissions of the main page (or main resource) for the user making the request.

About search excerpts and ACLs

Augmented Search returns results by searching through an aggregate of main pages and their subnodes. With the Elasticsearch highlighting feature, users can view an excerpt, or snippet, of the aggregated document with highlights on requested search terms. This provides a better search experience by letting users see the highlighted searched terms in context with its surrounding text.

However, excerpts may reveal some of a page subnode’s content that might not be accessible to the user otherwise. Augmented Search does not control which parts of aggregated documents are returned in the excerpt. This depends upon the search terms entered by the user. Pages and their subnodes are aggregated and ACLs are dealt with at a page level. If a page contains subnodes with a more restrictive authorization level than the page, there might be situations in which some portions of a subnode might be exposed through the excerpt. The document itself will not be accessible to the user and Augmented-Search doesn’t provide access to the indexed content itself, but some portions of the content could be made visible through search terms.

Restricting access to excerpts in search results

If you need to restrict access to pages with content that could be exposed through an excerpt (an excerpt is a portion of content in a page), we recommend that you modify permissions at the page and main resource levels to restrict access to those pages.

Alternatively, instead of modifying ACLs at a page and main resource level, you can also specify that pages are excluded from indexing in Augmented Search. For example, if you have a page listing your company’s products, but don’t want some of the products listed on this page to be searchable to particular users, you could index the individual products and exclude the listing page. The listing page would still be accessible to authorized users through navigation, while a user searching for a particular product would be directly redirected to the corresponding product page (and be subject to the required ACLs).]

Accessing node data

Augmented Search allows users, based on their authorization level, to access documents stored in the JCR directly from search results. This provides access to properties not indexed in Augmented Search directly from search results.

Note: Since those properties are not indexed in Augmented Search, accessing node data will impact search performance. The system first performs a search on the Elasticsearch cluster and then fetches the requested properties from the JCR for each of the search results.

The example query and response below shows how to fetch displayableName through a direct search on the Elasticsearch cluster and then fetch the node name from the JCR.

Query:

{
  jcr {
    searches(siteKey: "digitall", language: "en", workspace: LIVE) {
      search(q: "financial", limit: 2) {
        hits {
          displayableName
          node {
            name
          }
        }
      }
    }
  }
}

Response:

{
  "data": {
    "jcr": {
      "searches": {
        "search": {
          "hits": [
            {
              "displayableName": "Digitall Financial Report.pdf (Digitall Financial Report)",
              "node": {
                "name": "Digitall Financial Report.pdf"
              }
            },
            {
              "displayableName": "Leadership",
              "node": {
                "name": "leadership"
              }
            }
          ]
        }
      }
    }
  }
}