Migrate to Augmented Search 4.0.0
Migration is done using _reindex operation (more details here) which allows to directly migrate from Elasticsearch 7 to 9. This means all index-related pipelines, templates, settings and mappings have to be manually copied.
Elasticsearch 7 Data pre-migration checks
Make sure that mappings and settings currently used are compatible with Elasticsearch 9 by making the supported changes in Elasticsearch 7 either prior to migration or during the migration process.
For quick validation, check results of GET /_migration/deprecations from Elasticsearch 7 for any potential issues (hopefully most if not all is already resolved in this document)
- Mapping field types
Types are now required for all fields declared in mappings, and need to be specified. Make a note of any missing fields and will need to be added if you have any custom mappings.json used in Augmented Search indexing.
Inferred types from existing Elasticsearch 7 indices can be checked with this query:
GET /*jahia*/_field_caps?fields=jgql*
During migration, export of index mappings should include inferred types for fields with missing types, and can be imported into Elasticsearch 9 without issues.
- Slowlog levels in settings are now deprecated
- Language plugins could require additional settings
For language-specific settings, some analyzer and stemmer settings might require additional properties.
As an example, for the Japanese language settings that requires kuromoji plugin, analyzer settings require a mode property, and stemmer settings require a specified minimum_length property. Augmented Search 4 now supports specifying these settings as JSON objects and can be written as such:
{
...
"analyzer": {
"type": "kuromoji",
"mode": "extended"
},
"stemmer": {
"type": "kuromoji_stemmer",
"minimum_length": 2
}
}
Please refer to the analysis plugins documentation for details: https://www.elastic.co/docs/reference/elasticsearch/plugins/analysis-plugins
ES7 to ES9 data migration
Prerequisites: elasticsearch environment >= 9 with analysis-icu plugin installed
*jahia* to filter out only relevant settings to Augmented Search. But if there are other custom settings, templates that also needs to be migrated, it’s possible to include these by using a pattern list e.g. /*jahia*,*company1*1) Copy index pipelines
Execute a GET request _ingest/pipeline/jahia_as_language_detection_pipeline from source and copy the resulting jahia_as_language_detection_pipeline value object as a body of PUT request into destination e.g.
# GET jahia_as_language_detection_pipeline
curl -s "http://source-cluster:9200/_ingest/pipeline/*jahia*?pretty" > pipelines.json
# Extract data and copy to target
pipeline=$(jq -r 'keys[0]' pipelines.json)
body=$(jq -r --arg p "$pipeline" '.[$p]' pipelines.json)
curl -s -X PUT \
"https://target-cluster:9200/_ingest/pipeline/$pipeline" \
-H "Content-Type: application/json" \
-d "$body"
2) Copy component settings
Copy shard and replica settings by GET request on _component_template/*jahia* and for each component template, do a PUT request for each resulting component_template value object e.g:
# GET all jahia component templates
curl -s -X GET "http://source-cluster:9200/_component_template/*jahia*?pretty" > component_templates.json
# Copy to target for each component template using name as endpoint
# Body should look like: {"template": {"settings": {...}}}
for template_name in $(jq -r '.component_templates[].name' component_templates.json); do
body=$(jq -r ".component_templates[] | select(.name==\"$template_name\") | .component_template" component_templates.json)
curl -X PUT "http://target-cluster:9200/_component_template/$template_name" \
-H "Content-Type: application/json" \
-d "$body"
done
3) Copy index template settings
Copy index template settings by GET request on _index_template/*jahia* and do a PUT request for each resulting index_template value object e.g:
# GET all jahia index templates
curl -s -X GET "http://source-cluster:9200/_index_template/*jahia*?pretty" > index_templates.json
# Copy to target for each index template using name as endpoint
# Body should look like: {"index_patterns": [...], "composed_of": [...], ...}
for template_name in $(jq -r '.index_templates[].name' index_templates.json); do
body=$(jq -r ".index_templates[] | select(.name==\"$template_name\") | .index_template" index_templates.json)
curl -X PUT "http://target-cluster:9200/_index_template/$template_name" \
-H "Content-Type: application/json" \
-d "$body"
done
4) Get list of indices from source, and get mappings and settings for each index
# Get list of all indices as bash array (this var is used from step 4 to 11)
indices=($(curl -s "http://source-cluster:9200/_cat/indices/*jahia*?h=index"))
# Loop through each index and fetch mappings, settings
for index in "${indices[@]}"; do
# Mappings
curl -s -X GET "http://source-cluster:9200/$index/_mapping?pretty" > ${index}_mapping.json
# Settings
curl -s -X GET "http://source-cluster:9200/$index/_settings?pretty" > ${index}_settings.json
done
5) (Optional) check mapping-related and settings-related migration pre-checks (see pre-migration) and make the mapping migration if needed
6) Remove search.slowlog and indexing.slowlog in all index settings
7) Recreate indices on target and reapply mappings and settings
for index in "${indices[@]}"; do
# extract settings safely and remove problematic metadata during creation
settings=$(jq ".\"$index\".settings.index | del(.creation_date, .uuid, .version, .provided_name)" "${index}_settings.json")
mappings=$(jq ".\"$index\".mappings" "${index}_mapping.json")
# build combined payload
payload=$(jq -n \
--argjson settings "$settings" \
--argjson mappings "$mappings" \
'{settings: $settings, mappings: $mappings}'
)
curl -X PUT "http://target-cluster:9200/$index" \
-H "Content-Type: application/json" \
-d "$payload"
done
8) Apply new slowlog properties
curl -X PUT "http://localhost:9200/jahia*/_settings" -H 'Content-Type: application/json' -d'
{
"index.search.slowlog.threshold.query.warn": "2000ms",
"index.search.slowlog.threshold.query.trace": "500ms",
"index.search.slowlog.threshold.query.debug": "1000ms",
"index.search.slowlog.threshold.query.info": "1500ms",
"index.search.slowlog.threshold.fetch.warn": "1000ms",
"index.search.slowlog.threshold.fetch.trace": "200ms",
"index.search.slowlog.threshold.fetch.debug": "300ms",
"index.search.slowlog.threshold.fetch.info": "800ms",
"index.indexing.slowlog.threshold.index.warn": "10s",
"index.indexing.slowlog.threshold.index.trace": "500ms",
"index.indexing.slowlog.threshold.index.debug": "2s",
"index.indexing.slowlog.threshold.index.info": "5s"
}
'
9) Get aliases from source and reapply to target
# GET aliases and store in aliases.json
curl -s -X GET "http://source-cluster:9200/_cat/aliases/*jahia*?pretty&format=json" > aliases.json
# Use result of GET and pass to target as body
curl -X POST "http://target-cluster:9200/_aliases" \
-H "Content-Type: application/json" \
-d @aliases.json
10) explicitly allow the target cluster to connect to the source using the reindex.remote.whitelist setting (and should be disabled after):
curl -X PUT "http://target-cluster:9200/_cluster/settings" \
-H "Content-Type: application/json" \
-d '{
"persistent": {
"reindex.remote.whitelist": "source-cluster-host:9200"
}
}'
Note: this whitelist can also be applied transient (temporarily) but requires a restart to go back to default settings.
11) And finally, reindex documents for each index from source to target e.g. POST _reindex with payload (dest and source objects can include additional params, see here):
for index in "${indices[@]}"; do
curl -X POST "http://target-cluster:9200/_reindex" -H "Content-Type: application/json" -d "
{
\"source\": {
\"remote\": {
\"host\": \"http://source-cluster:9200\"
},
\"index\": \"$index\"
},
\"dest\": {
\"index\": \"$index\"
}
}"
done
Jahia migration
- Install elasticsearch-connector 4.0.0 and augmented-search-connector 4.0.0
- Uninstall the previous versions of the modules
- Download the connection migration script (from here)
- Run provisioning script:
curl -XPOST http://<url>/modules/api/provisioning \
--user root:root1234 \
--form 'script=[{"executeScript": "01-migrateConnectionNode.started.groovy"}];type=application/json' \
--form "file=@01-migrateConnectionNode.started.groovy" - Test Elasticsearch connection with graphql and should return true (check server logs if false):
query testES {
admin { elasticsearch { testConnection } }
}