Fine tuning your index

While the index is pretty flexible out-of-the box, you can add features to it to make it even more flexible. This section describes settings and mappings that were configured for Commerce IO.

Settings and mappings are located in the Commerce IO Store for SAP Hybris module in the META-INF folder. There are two files for each category and product index.

Settings

Settings files contain custom analyzers, tokenizers and filters. If you are not familiar with how this functionality works in Elasticsearch, refer to the Elasticsearch Getting Started documentation.

Note that while you can also specify shard information and more in the Settings files, node configuration is beyond the scope of this document. Settings related to node configuration contain default values.

For the category index, Jahia specifies a reverse_path_hierarchy tokenizer, which defines the custom reverse_path_analyzer. This enables Commerce IO to analyse paths in reverse order, making it easier to search for paths in certain cases.

For example, when the reverse analyzer is used, the path /a/b/c will only be matched with /a/b/c and not /a/b. Additionally, Commerce IO defines a standard path_analyzer. Both analyzers use a lowercase filter.

Product index settings are more complex, please review them directly.

Mappings

Analyzers

Product

Commerce IO ElasticSearch Product Index is configured with six custom analyzers to make searching as efficient as possible. Each analyzer uses a Path Hierarchy Tokenizer (see the Tokenizer section below). The Reverse Path Analyzer is configured for reverse ordering.

English Exact Analyzer
"english_exact": {
"tokenizer": "standard",
"filter": [ "lowercase" ]
}
Lowercase No Change Analyzer
"lowercase_no_change": {
"tokenizer": "keyword",
"filter": [ "lowercase" ]
}
Path Analyzer
"path_analyzer": {
"tokenizer": "path_hierarchy",
"filter": [ "lowercase" ]
}
Reverse Path Analyzer
"reverse_path_analyzer": {
"tokenizer": "reverse_path_hierarchy",
"filter": [ "lowercase" ]
}
Product NGram Analyzer
"product_ngram_analyzer": {
"tokenizer": "product_edge_ngram_tokenizer",
"filter": [ "lowercase",
"product_name_word_delimiter" ]
}
Product Name Analyzer
"product_name_analyzer": {
"tokenizer": "lowercase"
}

Tokenizer

Product

Commerce IO utilizes the following two custom tokenizers:

Path Hierarchy Tokenizer: Splits path into terms.
Product Edge NGram Tokenizer: Breaks text into words when it encounters any character from a list of specified characters. The configuration is [letter, digit] Min:1 Max: 10

The tokenizer outputs the terms in reverse order, for example: /collections/shoes/sandals → [collections/shoes/sandals, shoes/sandals, sandals/].

Filter

Product

A word delimiter token filter that transforms subwords into groups. As the default delimiter is used, the split is performed on any non-alphanumeric character. The original text is preserved as well.

"filter" : {
"product_name_word_delimiter": {
"type":"word_delimiter",
"preserve_original": true,
"catenate_words":true,
"catenate_numbers":true
}
}

Fine tuning your index

Settings

Mappings

Analyzers

Category

Product

Tokenizer

Category

Product

Filter

Category

Product