Augmented Search supports highlighting of searched terms by automatically adding html tags (
<em>) to content returned in the excerpt. Highlighting is a complex topic and often considered as a tradeoff between search convenience for the end user and performance.
Note that highlighting is not associated with the definition of search results and there will be situations where results will be returned without highlights in the excerpt. In this case, you may want to consider adjusting your Augmented Search configuration to refine the way that highlighting occurs.
Internally, Augmented Search processes search results for highlighting using a combination of three fields:
The content in Jahia, which is useful for returning in case of exact match highlights on individual words
The content after it has been processed by the ngram tokenizer (see: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html)
The content after it has been analyzed by the shingle filter (see: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-shingle-tokenfilter.html)
The default configuration first uses the ngram content until a limit of 12 characters is searched by a site visitor. If over 12 characters but less than 3 words is searched, Augmented Search will use the content itself. Finally, with over 3 words, results of the phrase analyzer is used.
Highlighting can be configured individually for the different languages using the
highligtingfields configuration setting with the corresponding language suffix:
org.jahia.modules.augmentedsearch.language.highligtingfields.ja = \ jgql:content, \ jgql:content.phrase
This setting supports one or two field values.
Taking the following text as an example :
Our documentation is here to help you deliver a great customer experience to the people who visit your site. Learn more depending on whether you create or manage content on your site, deploy and administer Jahia products, or develop and customize modules to extend functionality
The table shows how search terms are interpreted based on the sample text.
|Default||custom||ngram content applies as the search term is less than 12 characters. Every occurrence of "custom" is highlighted (customer, customize).|
|Default||custom content||The content field applies as more than 12 characters and less than 3 words are searched. Only "content" is highlighted as ngram is not used custom is not matching customer or customize.|
|Default||customer content||The content field applies as more than 12 characters and less than 3 words are searched. Only "customer" and "content" are highlighted.|
|Default||customize modules to extend||The phrase field applies as more than 12 characters and 3 words are searched. The "customize modules to extend" phrase is highlighted.|
|Default||customize modules extend||The phrase field applies as more than 12 characters and 3 words are searched. The phrase field doesn't match due to the absence of "to" in the search term. Nothing is highlighted|
|jgql:content||customer content help||A static configuration applies as more than 12 characters and 3 words are searched. Every occurrence of "customer", "content", and "help" is highlighted.|
|jgql:content, jgql:content.phrase||customize modules to extend||The phrase field applies as more than 3 words are searched. The "customize modules to extend" phrase is highlighted.|