Index & Searchable content

June 9, 2022

This section presents how to configure the index, so the search results do return the expected pages or content. Consult the Search relevance and boosting page to learn how the search experience can be fine-tuned to better address your needs and dataset.

Indexing Pages

In a site, pages can be regular Jahia pages (of type jnt:page), or they can be specific content types displayable in full page: such content types have what is called a content template. The best practice consists of adding the jmix:mainResource mixin to such content types to better identify them.

By default, Jahia pages and contents with the jmix:mainResource mixin are indexed as pages, and thus can be returned as search results.

If you need more, or different, content types to appear in the search results, then you need to edit this list, by declaring the content types hierarchically (the only constraint is thatjnt:page shall always be first) in the org.jahia.modules.augmentedsearch.content.indexedMainResourceTypes, as shown in the following example:

org.jahia.modules.augmentedsearch.content.indexedMainResourceTypes = jnt:page,jmix:mainResource,jnt:myCustomType

Note: if jmix:mainResource is declared in the indexedMainResourceTypes property, it is not necessary to declare content types with this mixing

Indexing the page content

By default, only the editorial content (which have jmix:editorialContent as supertype) created inside a page are indexed in the page.

But if your pages contain other types of content that need to be indexed, so they can be used to search for the page, then you need to declare them in the org.jahia.modules.augmentedsearch.content.indexedSubNodeTypes property. For instance:

org.jahia.modules.augmentedsearch.content.indexedSubNodeTypes = jmix:editorialContent, jnt:myNonEditorialContentType

Indexing nodetype properties

By default, all the text properties of the indexed content  (see the Indexing pages section above) are searchable through full text search.

Additionnaly, the following default properties can be used when building specific queries, e.g. when building an advanced search form or when implementing facets:

  • tags: jgql:tags
  • keyword: jgql:keywords
  • categories: jgql:categorized
  • displayableName: jgql:displayableName
  • nodetype: jgql:nodeType
  • creation date: jgql:created
  • creator: jgql:createdBy
  • last modification date: jgql:lastModified
  • last contributor: jgql:lastModifiedBy
  • last publication date: jgql:lastPublished
  • last publisher: jgql:lastPublishedBy
  • mimetype: jgql:mimeType

To add custom node type properties to this list, you need to declare them in the  in the org.jahia.modules.augmentedsearch.content.mappedNodeTypes configuration property:

  • using the node type only will add all the content type properties to the previous list
  • using the {contentType}.{propertyName} notation will only add the designated properties

In the following example, all the properties of the jnt:news content type will be available to build queries/facets, as well as the eventsType of jnt:event. The other jnt:event property will not be available:

org.jahia.modules.augmentedsearch.content.mappedNodeTypes = jnt:news, jnt:event.eventsType

Similarly, if you need to add additional properties to this list when searching for files, you need to declare them in the org.jahia.modules.augmentedsearch.file.mappedNodeTypes configuration property.

All the nodetypes declared in org.jahia.modules.augmentedsearch.content.mappedNodeTypes shall also be included in the declared types of the org.jahia.modules.augmentedsearch.content.indexedMainResourceTypes configuration property
Elasticsearch has a default limit of 1000 of such properties indexation. It is possible to increase this limit, however it may have impacts on performances. 

Indexing files

By default, only pdf files are indexed and searchable.

You can configure which files need to be indexed, based on the file extension using the org.jahia.modules.augmentedsearch.content.indexedFileExtensionsproperty:

  • Provide all the extension types in a comma-separated list:
    org.jahia.modules.augmentedsearch.content.indexedFileExtensions = pdf,docx,doc
  • Use * to index all files
  • Leave the property empty to not index files at all

As with any setting defining content to be indexed, this setting will have a direct impact on the size of your Elasticsearch indices and therefore the resource requirement of your entire Elasticsearch cluster. You should only specify in this configuration setting filetypes you aim at being searchable on your platform. 

Workspace indexation

By default, the content is indexed in both live and staging. This means that you can search for content in preview mode.

If you do not need to search for content in preview, or if the index size is a concern, you can index only live content by setting the org.jahia.modules.augmentedsearch.workspaces property to LIVE:

org.jahia.modules.augmentedsearch.workspaces = LIVE

Use ALL to return to the default behavior of indexing both staging and live content.