For documents to be searchable they need to be indexed in an Elasticsearch cluster first. This is done by identifying main resources nodes and their subnodes, as defined in Augmented Search configuration file.
As part of the indexing process, corresponding ACLs and roles (coming from the source nodes) are attached to the Elasticsearch documents. This allows Augmented Search to return search results matching the visitor permissions.
Indexing is currently triggered by various type of events, when:
Indexing time depends on the number of documents to index. For page updates, documents are available almost immediately, while for large full-site indexing, expect it to take a couple of minutes.
In Augmented Search UI (as well as its GraphQL API), you can trigger indexing either individually per site or for all sites. These actions have slightly different behavior in the way indexing is handled, and it is critical to understand these differences.
Augmented Search uses Elasticsearch aliases when communicating with data indices and share indices across multiple sites (with different indices based on language and content type). When triggering indexing across all configured sites, Augmented Search creates a set of new indices and start populating them with data. Once the indexing is complete, the alias is updated to point to the new indices, and the old indices are deleted. This operation of creating new indices allow for new mappings or new settings to be applied to the newly created indices.
When triggering indexing for one single site, Augmented Search goes through all of the Jahia nodes for that particular site and pushes them to Elasticsearch. This results in either an update of existing documents or the creation of new documents (for new sites), but the indices themselves are not modified (only their content for the site being indexed), therefore not modifying their mapping nor settings.
In some situations, it might be necessary to exclude a document from Augmented Search, for example, if you want to make sure specific content cannot show up in the excerpt (see Search results and content filtering).
Excluding content can be done using the Remove From Augmented Search Results mixin. If the document was previously indexed, it will be removed from Augmented Search upon save.
Two options are available:
indexedMainResourcesTypes
in the Augmented Search configuration file, the subcontent will be indexed. This is useful when a site contains a tree of documents, for example news organized by year. You might not want to have the page listing news for the year 2018 available in search results but would want each individual news item to be indexed.indexedMainResourcesTypes
.Excluding a parent will also exclude all of its subpages.
Note that any modifications to the Elasticsearch mapping require a reindexing of the content for all sites.