Index & Searchable content
This section presents how to configure the index, so the search results do return the expected pages or content. Consult the Search relevance and boosting page to learn how the search experience can be fine-tuned to better address your needs and dataset.
Indexing Pages
In a site, pages can be regular Jahia pages (of type jnt:page
), or they can be specific content types displayable in full page: such content types have what is called a content template. The best practice consists of adding the jmix:mainResource
mixin to such content types to better identify them.
By default, Jahia pages and contents with the jmix:mainResource
mixin are indexed as pages, and thus can be returned as search results.
If you need more, or different, content types to appear in the search results, then you need to edit this list, by declaring the content types hierarchically (the only constraint is thatjnt:page
shall always be first) in the org.jahia.modules.augmentedsearch.content.indexedMainResourceTypes
, as shown in the following example:
org.jahia.modules.augmentedsearch.content.indexedMainResourceTypes = jnt:page,jmix:mainResource,jnt:myCustomType
Note: if jmix:mainResource
is declared in the indexedMainResourceTypes
property, it is not necessary to declare content types with this mixing
Indexing the page content
By default, only the editorial content (which have jmix:editorialContent
as supertype) created inside a page are indexed in the page.
But if your pages contain other types of content that need to be indexed, so they can be used to search for the page, then you need to declare them in the org.jahia.modules.augmentedsearch.content.indexedSubNodeTypes
property. For instance:
org.jahia.modules.augmentedsearch.content.indexedSubNodeTypes = jmix:editorialContent, jnt:myNonEditorialContentType
Indexing nodetype properties
By default, all the text properties of the indexed content (see the Indexing pages section above) are searchable through full text search.
Additionnaly, the following default properties can be used when building specific queries, e.g. when building an advanced search form or when implementing facets:
- tags: jgql:tags
- keyword: jgql:keywords
- categories: jgql:categorized
- displayableName: jgql:displayableName
- nodetype: jgql:nodeType
- creation date: jgql:created
- creator: jgql:createdBy
- last modification date: jgql:lastModified
- last contributor: jgql:lastModifiedBy
- last publication date: jgql:lastPublished
- last publisher: jgql:lastPublishedBy
- mimetype: jgql:mimeType
To add custom node type properties to this list, you need to declare them in the in the org.jahia.modules.augmentedsearch.content.mappedNodeTypes
configuration property:
- using the node type only will add all the content type properties to the previous list
- using the
{contentType}.{propertyName}
notation will only add the designated properties
In the following example, all the properties of the jnt:news content type will be available to build queries/facets, as well as the eventsType of jnt:event. The other jnt:event property will not be available:
org.jahia.modules.augmentedsearch.content.mappedNodeTypes = jnt:news, jnt:event.eventsType
Similarly, if you need to add additional properties to this list when searching for files, you need to declare them in the org.jahia.modules.augmentedsearch.file.mappedNodeTypes
configuration property.
Indexing files
By default, only pdf files are indexed and searchable.
You can configure which files need to be indexed, based on the file extension using the org.jahia.modules.augmentedsearch.content.indexedFileExtensions
property:
- Provide all the extension types in a comma-separated list:
org.jahia.modules.augmentedsearch.content.indexedFileExtensions = pdf,docx,doc
- Use
*
to index all files - Leave the property empty to not index files at all
As with any setting defining content to be indexed, this setting will have a direct impact on the size of your Elasticsearch indices and therefore the resource requirement of your entire Elasticsearch cluster. You should only specify in this configuration setting filetypes you aim at being searchable on your platform.
Workspace indexation
By default, the content is indexed in both live and staging. This means that you can search for content in preview mode.
If you do not need to search for content in preview, or if the index size is a concern, you can index only live content by setting the org.jahia.modules.augmentedsearch.workspaces
property to LIVE
:
org.jahia.modules.augmentedsearch.workspaces = LIVE
Use ALL
to return to the default behavior of indexing both staging and live content.
Preventing specific contents from appearing in the search results
It is possible to exclude a specific section, page, content item, folder or file from being indexed, and thus appearing in the search results.
Using Content Editor
To do so, when editing the page/content you need to enable the Remove From Augmented Search Results mixin in the Options section:
Two options are available:
- Current content only
Removes only the selected content from the index.
If this content has subcontent defined asindexedMainResourcesTypes
in the Augmented Search configuration file, the subcontent will be indexed. This is useful when a site contains a tree of documents, for example news organized by year. You might not want to have the page listing news for the year 2018 available in search results but would want each individual news item to be indexed. - Current content and subcontent items
Removes both content and subcontent from the index, even if the subcontent is defined as anindexedMainResourcesTypes
. When using this option on a page, all the subpages will be excluded from the indexation.
If the document was previously indexed:
- it will be removed from Augmented Search in preview upon save.
- it will be removed from Augmented Search in live upon publication.
Programmaticaly
To remove a specific node from the index, you need to add the jmix:skipESIndexation
mixin on this node. This can be done using a GraphQL mutation. By default, it will remove the content only (current-only
). If you need to remove the subcontents as well, you will need to update the value of the the skipIndexationString
to current-subtree
:
mutation excludeContent {
jcr(workspace: EDIT) {
mutateNode(pathOrId: "/sites/digitall/home/example") {
addMixins(mixins: "jmix:skipESIndexation")
mutateProperty(name: "skipIndexationString") {
setValue(type: STRING, value: "current-subtree")
}
}
}
}
This mutation can also be executed on the LIVE workspace
Reindex a specific content
It is possible to use a GraphQL mutation to trigger the redindexing of specific nodes:
mutation indexNode($nodePaths: [String!], $workspace: Workspace, $inclDescendants: Boolean = false) {
admin {
search {
startNodeIndex(nodePaths:$nodePaths, inclDescendants: $inclDescendants, workspace: $workspace) {
jobs {
id
status
}
}
}
}
}
using the following variables:
{
nodePaths: [pagePathA],
workspace: 'LIVE',
inclDescendants: true,
}
Set the inclDescendants
parameter to true
to reindex the children nodes of the nodePaths
, or set it to false
to only reindex the given nodes. Please note that when inclDescendants
is set to true
when reindexing pages, it will reindex the content of the page, but not the subpages.
The node will be indexed in all the languages of the site.