Search results and content filtering

November 14, 2023

Augmented Search submits documents to be indexed to an Elasticsearch cluster and allows users to search on those indexed documents. During the indexing process, Augmented Search aggregates a page and its subnodes, then extracts their read permissions and stores those alongside the indexed documents. When receiving a search query, Augmented Search filters out results to only return those matching the read permissions of the main page (or main resource) for the user making the request.

About search excerpts and ACLs

Augmented Search returns results by searching through an aggregate of main pages and their subnodes. With the Elasticsearch highlighting feature, users can view an excerpt, or snippet, of the aggregated document with highlights on requested search terms. This provides a better search experience by letting users see the highlighted searched terms in context with its surrounding text.

However, excerpts may reveal some of a page subnode’s content that might not be accessible to the user otherwise. Augmented Search does not control which parts of aggregated documents are returned in the excerpt. This depends upon the search terms entered by the user. Pages and their subnodes are aggregated and ACLs are dealt with at a page level. If a page contains subnodes with a more restrictive authorization level than the page, there might be situations in which some portions of a subnode might be exposed through the excerpt. The document itself will not be accessible to the user and Augmented-Search doesn’t provide access to the indexed content itself, but some portions of the content could be made visible through search terms.

Restricting access to excerpts in search results

If you need to restrict access to pages with content that could be exposed through an excerpt (an excerpt is a portion of content in a page), we recommend that you modify permissions at the page and main resource levels to restrict access to those pages.

Alternatively, instead of modifying ACLs at a page and main resource level, you can also specify that pages are excluded from indexing in Augmented Search. For example, if you have a page listing your company’s products, but don’t want some of the products listed on this page to be searchable to particular users, you could index the individual products and exclude the listing page. The listing page would still be accessible to authorized users through navigation, while a user searching for a particular product would be directly redirected to the corresponding product page (and be subject to the required ACLs).]

Accessing node data

Previous versions of Augmented Search allowed fetching additional node data directly from the search results. This approach can have strong performance implications since those elements are not directly indexed in Elasticsearch and Augmented Search would generate individual queries to the JCR to fetch data.

Instead, our recommended approach if the needed data cannot be indexed in Elasticsearch, is to perform a second GraphQL query to fetch additional data. From a performance standpoint, it provides the same level of overhead (caused by multiple access to the JCR) but doesn't "block" the Augmented Search query until all results are fetched. 

Simply fetch the document id from the first query, as demonstrated below:


query {
  search(
    q: "jahia"
  ) {
    results {
      hits {
        id
        displayableName
      }
      totalHits
    }
  }
}


The perform a follow-up GraphQL query to the JCR using the IDs fetched earlier:


query {
  jcr(workspace: LIVE) {
    nodesById (uuids: ["60439110-e5eb-4afc-bf0b-6b12e9c616bb", "b2ecaa6e-ebe3-4e71-a123-477a36143989"]) {
    path
  }
}