Augmented search FAQs

  Written by The Jahia Team
   Estimated reading time:

Augmented Search UI

Styling

How can I customize the styling of the Augmented Search UI?

Instant Search

How can I change the number of letters that trigger instant search?

How can I deactivate instant search?

Facets

Can I use hierarchical facets with categories?

Are facets conjunctive or disjunctive?

Can I index custom properties of type categories?

Search box

Can I use quotes to search for exact matches (for example, “my exact search”)?

Can I use boolean operators in the search box, for example AND, OR, or NOT?

Is autocomplete supported on terms that are most searched?

Highlighting

How does the highlighting work? Is there a match between the language used for lemmatization and the highlighting?

Elasticsearch and indexing

Infrastructure

Can I use the same Elasticsearch cluster for different Jahia platforms?

Indexation

Should I use content-based search or page-based search? Can I combine page-based search with content-based search?

Can I index content that is not displayed in full page?

Can I boost a custom property of a content type in the definitions.cnd file?

Can I exclude some content from being indexed, like the Google no-index option?

Can I add a negative boost on some content?

Can I set a negative boost for some content types?

Can I define boosts by website?

Can I configure synonyms?

What analyzer does Augmented Search use?

Can I use different analyzers?

Which language is used for lemmatization?

How can I display the facet for a custom field? 

How can I display categories facets?

Can I add facets on custom properties? Or are they added automatically?

Can I customize Elasticsearch settings and mappings?

How can I customize Ngrams?

What are the default boosts?

How is the fuzzy match configured?

How content is indexed in augmented search?

Definitions.cnd and Elasticsearch configurations

What parameters can I set in the definitions.cnd file to modify the augmented search indexation (for example, nofulltext, indexed=no, boost, analyzer=keyword)?

Legacy components

What are the legacy Jahia search-related components that continue to work in an Augmented Search setup (for example, glossary or pager)? If some are not working anymore what are the alternative ones?

Does the default behavior display all matching results or can I set a default limit?

Upgrading from JCR search

Can I have one site running with JCR search and another one with augmented search?

External Data Provider (EDP)

How can I index content from the External Data Provider? 

Is it possible to have search results including contents coming from the External Data Provider and from the JCR?

Answers

How can I customize the styling of the Augmented Search UI? 

You can customize the Augmented Search UI module by applying CSS from another template that defines the styling that you want to use. You can also fork the Augmented Search UI module.

How can I change the number of letters that trigger instant search?

In the Augmented Search UI module and in the SearchView.jsx file, modify the value for the debounceLength parameter in the SearchBox component.

<SearchBox
   searchAsYouType
   debounceLength={100}
/>

See search-ui SearchBox documentation for more information on the SearchBox component and parameters.

How can I deactivate instant search?

Just remove searchAsYouType from the SearchBox component in the SearchView.jsx file.

<SearchBox/>

See search-ui SearchBox documentation for more information on the SearchBox component and parameters.

Can I use hierarchical facets with categories?

In Augmented Search 1.0, hierarchical facets are not supported.

Are facets conjunctive or disjunctive?

Facets are disjunctive by default. Note that disjunctive facets enable users to select one or more facets to filter search results and conjunctive facets allow users to select only one.

Can I index custom properties of type categories?

No, only j:defaultCategory from jmix:categorized are indexed as such.

Can I use quotes to search for exact matches (for example, “my exact search”)?

No, using quotes to get an exact match is not supported by augmented search. However, if a search result matches the exact order of the searched words, it will be boosted automatically. 

Can I use boolean operators in the search box, for example AND, OR, or NOT?

No, boolean operators are not supported by augmented search.

Is autocomplete supported on terms that are most searched?

No, autocomplete is usually performed on terms that are the most relevant, rather than terms that are most used. 

How does the highlighting work? Is there a match between the language used for lemmatization and the highlighting?

Yes.

Can I use the same Elasticsearch cluster for different Jahia platforms?

Yes, you can add a prefix for each index, so one prefix per platform.

Should I use content-based search or page-based search? Can I combine page-based search with content-based search?

Yes, depending on your site and business requirements, you can configure one part of your website with page-based search, by using filter on path. Then, you could index the rest of the website using content-based search.

Can I index content that is not displayed in full page?

Yes. Content is indexed by node type and sub type.

Can I boost a custom property of a content type in the definitions.cnd file?

No, it’s not possible to boost custom fields.

Can I exclude some content from being indexed, like the Google no-index option?

This is not possible with Augmented Search 1.0.

Can I add a negative boost on some content?

This is not possible with Augmented Search 1.0.

Can I set a negative boost for some content types?

This is not possible with Augmented Search 1.0.

Can I define boosts by website?

No, indexation is done at the platform level and all sites are affected.

Can I configure synonyms?

Yes. You can configure synonyms using standard Elasticsearch configuration.

What analyzer does Augmented Search use?

Each language uses its own index and dedicated analyzer.

Can I use different analyzers?

Yes. You can configure analyzers and stemmers by modifying the OSGI properties in the Augmented Search module. You can do this in the configuration file, Karaf console, or in Jahia Tools.

Which language is used for lemmatization?

The indexation process does not use lemmatization by default, as Elasticsearch and Lucene only provide stemming out-of-the-box.

How can I display the facet for a custom field? 

In jgql:nodes.audiences.keywords, add your field to the Elasticsearch mapping. If your field has a namespace, surround the namespace and field in quotes.

How can I display categories facets?

Add the jgql.categories.keyword to your query. This will create a facet based on the category's title in the index language.


{
  jcr(workspace:LIVE) {
    searches(siteKey:"digitall", language:"en", workspace:LIVE) {
      search(q:"alternative",
      facets:{
        term:{field:"jgql:categories.keyword", minDocCount:1, }
      }) {
        facets {
          data {
            ... on TermValue {
              count
              value
            }
          }
        }
        hits {
          displayableName
          id          
        }
      }
    }
  }
}

The search will return something like this.


"search": {
          "facets": [
            {
              "data": [
                {
                  "count": 2,
                  "value": "Categories"
                },
                {
                  "count": 1,
                  "value": "Annual Filings"
                },
                {
                  "count": 1,
                  "value": "Companies"
                },
                {
                  "count": 1,
                  "value": "Goods"
                }
              ]
            }
          ],
          "hits": [
            {
              "displayableName": "Home",
              "path": "/sites/digitall/home"
            },
            {
              "displayableName": "Press Releases Entry",
              "path": "/sites/digitall/home/investors/press-releases-entry"
            }
          ]
        }

Can I add facets on custom properties? Or are they added automatically?

Yes, indexed custom properties are automatically indexed as text or keyword, enabling you to use them for facets.
To use a property as a facet:

  1. Modify the augmented search configuration by adding the definition types you want to map and index.
    1. In Jahia Tools, navigate to Administration and Guidance>OSGi console.
    2. Select OSGI>Configuration and edit values for the org.jahia.modules.augmentedsearch module.
    3. Edit the following org.jahia.modules.augmentedsearch.content.indexedMainResourceTypes and org.jahia.modules.augmentedsearch.content.mappedNodeTypes properties. The following example shows adding  the jacademix:document definition type.

      jahia-tools-as.png

  2. In Jahia, reindex your data.
    1. In Administration, select Configuration>Augmented search management.
    2. Click Index the content in the main window. Then click Save.
      augmented-search-reindexed.png

Now your data can be used in your queries. The following example show how to make a jacademix:document mixin a main resource to be searched. The example also shows how to map it so that the author property can be used for facets.



{
  jcr {
    searches(siteKey: "academy", language: "en", workspace: LIVE) {
      search(q: "cluster", limit: 20, offset: 0, 
        filter: {nodeType: {type: "jacademix:document"}}, 
        facets: {
          term: [
          {field:"author.keyword", minDocCount:1}]}) {
        totalHits
        took
        facets {
          field
          type
          data {
            ... on TermValue {
              count
              value
            }            
          }
        }
        hits {
          id
          link
          displayableName
          excerpt
          score
          lastModified
          lastModifiedBy
          createdBy
          created
        }
      }
    }
  }
}

Can I customize Elasticsearch settings and mappings?

Yes you can. To customize Elasticsearch settings and mappings:

  1. Copy the embedded files from augmented-search modules. Copy the mapping.json and settings.json files from META-INF/configurations to a location where they can be referenced by your Jahia.
  2. Then, update the configuration file to reflect the new paths to the files.
  3. There is a property for the settings and the mapping. Each property can be specified for both content and files, so this gives the following four properties.
    
    org.jahia.modules.augmentedsearch.content.settingsFileLocation
    org.jahia.modules.augmentedsearch.file.settingsFileLocation
    org.jahia.modules.augmentedsearch.content.mappingFileLocation
    org.jahia.modules.augmentedsearch.file.mappingFileLocation
    # Example:
    # org.jahia.modules.augmentedsearch.content.settingsFileLocation = /opt/jahia/elasticsearch/settings.json

How can I customize Ngrams?

First, copy the embedded configuration files. Once you have copied the JSON files, edit the settings.json file. Locate the tokenizer definition at the end of the file.


"tokenizer": {
 ...
 "main_tokenizer": {
   "type": "edge_ngram",
   "min_gram": 1,
   "max_gram": 12,
   "token_chars": [
     "letter",
     "digit"
   ]
 },
 "metadata_tokenizer": {
   "type": "edge_ngram",
   "min_gram": 1,
   "max_gram": 12,
   "token_chars": [
     "letter",
     "digit"
   ]
 }
}

Here you can tune the min_gram and max_gram properties. 

  • min_gram
    Specifies when “instant search” applies to searches that your users perform. A value of 1 means that users get results on the first keyboard stroke. A value of 3 means results display when they type at least 3 characters.
  • max_gram
    Determines the length of the maximum groups of letters, by default up to 12 letters. This value depends on your dataset, the complexity of your vocabulary, and the different languages you are going to index. For example,  some languages like German tend to compound words together.
Note: The max_gram property has a significant impact on the size of your index. Each word will generate up to 12 token, ranging from 1 to 12 characters in length.

What are the default boosts?

Boost settings are applied by default to the jgql:main, jgql:metadata, and jgql:content fields.


#
# Boost settings for fields: jgql:main, jgql:metadata and jgql:content
#
org.jahia.modules.augmentedsearch.field.main.boost = 2.0
org.jahia.modules.augmentedsearch.field.metadata.boost = 1.5
org.jahia.modules.augmentedsearch.field.content.boost = 1.5


How is the fuzzy match configured?

By default, the fuzzy matching starts at the 4th character. Also, it can permute one letter, starting at the 3rd character. The first 2 letters need to be exact.

How content is indexed in augmented search?

All content is split in 3 fields:

  • Main
    Indexes the displayable name of the content, usually the title or alternatively the 128 first characters if rich-text. By default, the weight = 2.
  • Metadata
    Indexes the categories, tags and keywords that are set on each content. By default, the weight = 1.
  • Content
    Aggregates all full-text properties into one field to provide an efficient full-text search. By default, the weight = 1.

Each of these fields is analyzed and stored in the following subfields to provide the best search relevance out of the box: 

  • Stemming
    Takes the searched term and tries to match it against the stem (for example developer > develop). This subfield applies to all words in your searched term.
  • Ngram
    Edge Ngram analyzes each word and emits a token for each group of letter in the defined limit (1-10) (ex: wolf -> [w, wo, wol, wolf]). This subfield is mainly used when the visitor starts typing words. 
  • Phrase
    Matches the searched terms against the indexed content. If the searched terms have a match with the indexed content, then the order of the words has an impact.
  • Exact match
    Checks the exact match between the searched term and the indexed content. Exact match has a lot of weight. 

What parameters can I set in the definitions.cnd file to modify the augmented search indexation (for example, nofulltext, indexed=no, boost, analyzer=keyword)?

The query uses the main, content, and metadata fields, which do not take into account boost or analyzer. The properties that are not indexable are not indexed (indexed=no). The properties that are not full text are not copied in the field content and are not part of the query for search, but they can be used for filtering or faceting.

What are the legacy Jahia search-related components that continue to work in an Augmented Search setup (for example, glossary or pager)? If some are not working anymore what are the alternative ones?

No legacy Jahia search components will continue to work. Only the Augmented Search UI component uses the Search UI library from Elasticsearch. See the Elasticsearch documentation for components available for you to use with your search application.

  • SearchBox
  • Results
  • Result
  • ResultsPerPage
  • Facet
  • Sorting
  • Paging
  • PagingInfo
  • ErrorBoundary
  • Search results

Does the default behavior display all matching results or can I set a default limit?

The default limit is 10 results if nothing is specified in the GraphQL query.

Can I have one site running with JCR search and another one with augmented search?

Yes. Augmented search is not based on search provider so the JCR search is still available. You can add the Augmented Search UI on one site and not another.

How can I index content from the External Data Provider? 

You can use the event API to index content from the External Data Provider. For more information, see Sending events to Jahia.

Is it possible to have search results including contents coming from the External Data Provider and from the JCR?

Yes it is possible, and the search results will be mixed, as if they were from the same content source (as opposed to the JCR search today where the JCR results are displayed before the EDP results).

Related links