Highlighting

November 14, 2023

Highlighting

Augmented Search supports highlighting of searched terms by automatically adding html tags (<em>) to content returned in the excerpt. Highlighting is a complex topic and often considered as a tradeoff between search convenience for the end user and performance.

Note that highlighting is not associated with the definition of search results and there will be situations where results will be returned without highlights in the excerpt. In this case, you may want to consider adjusting your Augmented Search configuration to refine the way that highlighting occurs.

Internally, Augmented Search processes search results for highlighting using a combination of three fields:

The default configuration first uses the ngram content until a limit of 12 characters is searched by a site visitor. If over 12 characters but less than 3 words is searched, Augmented Search will use the content itself. Finally, with over 3 words, results of the phrase analyzer is used.

Highlighting can be configured individually for the different languages using the highligtingfields configuration setting with the corresponding language suffix:

org.jahia.modules.augmentedsearch.language.highligtingfields.ja = \   
   jgql:content, \ 
   jgql:content.phrase

This setting supports one or two field values.

Taking the following text as an example :

Our documentation is here to help you deliver a great customer experience to the people who visit your site.

Learn more depending on whether you create or manage content on your site, 
deploy and administer Jahia products, or develop and customize modules to extend functionality

The table shows how search terms are interpreted based on the sample text.

Configuration Search terms Highlighting
Default custom ngram content applies as the search term is less than 12 characters. Every occurrence of "custom" is highlighted (customer, customize).
Default custom content The content field applies as more than 12 characters and less than 3 words are searched. Only "content" is highlighted as ngram is not used custom is not matching customer or customize.
Default customer content The content field applies as more than 12 characters and less than 3 words are searched. Only "customer" and "content" are highlighted.
Default customize modules to extend The phrase field applies as more than 12 characters and 3 words are searched. The "customize modules to extend" phrase is highlighted.
Default customize modules extend The phrase field applies as more than 12 characters and 3 words are searched. The phrase field doesn't match due to the absence of "to" in the search term. Nothing is highlighted
jgql:content customer content help A static configuration applies as more than 12 characters and 3 words are searched. Every occurrence of "customer", "content", and "help" is highlighted.
jgql:content, jgql:content.phrase customize modules to extend The phrase field applies as more than 3 words are searched. The "customize modules to extend" phrase is highlighted.