Fine tuning your index

October 8, 2024

While the index is pretty flexible out-of-the box, you can add features to it to make it even more flexible. This section describes settings and mappings that were configured for Commerce IO.

Settings and mappings are located in the Commerce IO Store for SAP Hybris module in the  META-INF folder. There are two files for each category and product index.

Settings

Settings files contain custom analyzers, tokenizers and filters. If you are not familiar with how this functionality works in Elasticsearch, refer to the Elasticsearch Getting Started documentation.  

Note that while you can also specify shard information and more in the Settings files, node configuration is beyond the scope of this document. Settings related to node configuration contain default values.

For the category index, Jahia specifies a reverse_path_hierarchy tokenizer, which defines the custom reverse_path_analyzer. This enables Commerce IO to analyse paths in reverse order, making it easier to search for paths in certain cases.

For example,  when the reverse analyzer is used, the path /a/b/c will only be matched with /a/b/c and not /a/b. Additionally, Commerce IO defines a standard path_analyzer. Both analyzers use a lowercase filter.

Product index settings are more complex, please review them directly.

Mappings

Analyzers

Category

Commerce IO ElasticSearch Category Index is configured with two custom analyzers to make searching effortless. Both analyzers use a Path Hierarchy Tokenizer (see the Tokenizer section). The Reverse Path Analyzer is configured for reverse ordering.

  • Path Analyzer
    "path_analyzer": {
    "tokenizer": "path_hierarchy",
    "filter": [ "lowercase" ]
    }
  • Reverse Path Analyzer
    "reverse_path_analyzer": {
    "tokenizer": "reverse_path_hierarchy",
    "filter": [ "lowercase" ]
    }

Product

Commerce IO ElasticSearch Product Index is configured with six custom analyzers to make searching as efficient as possible. Each analyzer uses a Path Hierarchy Tokenizer (see the Tokenizer section below). The Reverse Path Analyzer is configured for reverse ordering.

  • English Exact Analyzer
    "english_exact": {
    "tokenizer": "standard",
    "filter": [ "lowercase" ]
    }
  • Lowercase No Change Analyzer
    "lowercase_no_change": {
    "tokenizer": "keyword",
    "filter": [ "lowercase" ]
    }
  • Path Analyzer
    "path_analyzer": {
    "tokenizer": "path_hierarchy",
    "filter": [ "lowercase" ]
    }
  • Reverse Path Analyzer
    "reverse_path_analyzer": {
    "tokenizer": "reverse_path_hierarchy",
    "filter": [ "lowercase" ]
    }
  • Product NGram Analyzer
    "product_ngram_analyzer": {
    "tokenizer": "product_edge_ngram_tokenizer",
    "filter": [ "lowercase",
          "product_name_word_delimiter" ]
    }
  • Product Name Analyzer
    "product_name_analyzer": {
    "tokenizer": "lowercase"
    }

Tokenizer

Category

Commerce IO Category Index utilizes a Path Hierarchy Tokenizer to split paths into terms.

The tokenizer outputs the terms in reverse order, for example: /collections/shoes/sandals → [collections/shoes/sandals, shoes/sandals, sandals/].

Product

Commerce IO utilizes the following two custom tokenizers:

  • Path Hierarchy Tokenizer: Splits path into terms.
  • Product Edge NGram Tokenizer: Breaks text into words when it encounters any character from a list of specified characters. The configuration is [letter, digit]  Min:1 Max: 10

The tokenizer outputs the terms in reverse order, for example: /collections/shoes/sandals → [collections/shoes/sandals, shoes/sandals, sandals/].

Filter

Category

A lowercase filter that lowercases the tokenized result.

Product

A word delimiter token filter that transforms subwords into groups. As the default delimiter is used, the split is performed on any non-alphanumeric character. The original text is preserved as well.

"filter" : {
"product_name_word_delimiter": {
     "type":"word_delimiter",
     "preserve_original": true,
     "catenate_words":true,
     "catenate_numbers":true
     }
}