Queries, search, and indexing in Jahia

October 8, 2024

Introduction

Augmented Search is now the recommended way to implement any advanced requirement regarding full-text search. For more information, see Augmented Search overview and architecture. Augmented Search significantly improves the performance of search features compared to the functionality described in this topic.

Jahia provides an integrated a mix of search frameworks and uses a selected set of features from each to offer a flexible search solution.
This topic gives an overview of the query and search features in Jahia, including:

  • a comparison between query and search tags
  • the basics about using queries
  • an explanation of the search/indexing configuration options

More information about the search module and examples how to create simple and advanced search forms are described in the search module. The topic also provides more detailed information about using the facet module and you can reference the taglibDoc to get details about the provided tags.

Query and search features

Indexing

No need for separate index data schema configuration (like in Solr) as Jahia automatically does it according to the nodetype definitions (CND).

Content is immediately indexed after editing or publishing, except document text extraction, which is done asynchronously.
Support Solr-based filtering for faceted search.

Ability to use different score.boosts and text analyzers per field and language (different tokenizers, stemmers, stopwords).
Jahia automatically re-configures the index schema on-the-fly if new modules/templates are deployed or updated on a running server not requiring a server restart.

Since Jahia uses Jackrabbit, each cluster node has to maintain its own local index. Also due to Jackrabbit re-indexing without the need of restarting the server is no longer supported. You have to shutdown and manually delete the index folders.

If required, Jahia can create and update a spellcheck dictionary from a site's index in configurable intervals.
There is no longer a distinction between content and document repository, both are now located in the same Jackrabbit based repository.

Rich document parsing (text extraction) is supported by Apache Tika

Search

  • Structured search based on any number of constraints on any content and metadata field
  • Fully supports the query object model as specified in the JSR-283 with the ability to define a selector and multiple constraints and orderings or even multiple selectors using joins.
  • Query in a content list by using the child-node constraint.
  • Query in multiple lists with same node or mixin type
  • in multiple specified sections (descendant, childnode)
  • in multiple sites
  • in a different site
  • in current site
  • Automatically queries content using the current language and workflow state (live/edit mode)
  • Allow queries in content of other language(s) or workflow state(s) than the current
  • Sort by any number of fields (also relevance)
  • Support for faceted search and browsing based on indexed fields, dates, ranges and arbitrary queries
  • Unstructured search (fulltext)
  • Supports the fulltext query syntax as specified in the JSR-283 (optional and mandatory words, phrase search, without words, use of wildcards)
  • Search through all site content
  • but references are not automatically followed (yet)
  • Search in internal document repository and/or external repositories (via Jahia's UCH)
  • Search term highlighting in results
  • Sort by relevance
  • Search in multiple sites
  • Ability to configure returning one or multiple hits per page as well as to use query parameters in the search hit's URL by using rules
  • Ability to exclude fields from being fulltext searched by configuration
  • Consider restricted content to not return search hits, which cannot be accessed by the current user
  • Specify limit for number of search results for better performance, to for instance display just the top ten related content objects
  • Return total number of hits

Easy template development

  • Powerful JSP tags to remove the necessity for scriptlets
  • Tag libraries for creation of simple and advanced search forms (site and document search) with support for unstructured query and metadata search
  • Tag library for structured queries
  • Tag library for faceted search support (show hits per facet, display already selected facets in the path and allow to remove them again)
  • Template developer can create different result snippets per found content type and for instance display the most relevant excerpt of a larger content object (with highlighted search terms) or display other data related to the found content object
  • Offer URL to link to the page displaying the found content with rule based query parameter manipulation (e.g. container details view or correct page in pagination)

Advanced features

  • Use Solr-based Spellchecker, which is better supporting multiple terms per query in the "Did you mean?" support
  • Automatically creates OpenSearch tags to be detectable by OpenSearch clients (like Firefox)

Administration/Development

  • Possibility of writing the created queries into the Jahia log
  • Luke can be used to view the index and make sample queries
  • Configure index settings and analyzers, highlighting, spellchecking via Jackrabbit's configuration

Used frameworks

Index/search in Jahia works out-of-the-box by using just the default configuration settings, but you can customize and tune settings in these configurations (mainly based on Jackrabbit, Lucene, Solr and some custom configurations).

Jahia has integrated multiple search frameworks and uses a selected set of features from each in order to offer a powerful and flexible search solution.

  • Apache Jackrabbit using Apache Lucene (integrated library)
  • Jahia content is indexed and searched via Jackrabbit, which phased out the custom Jahia index and Compass integration.
  • Abstract query object model is used as basis for our tags to create container queries
  • Rich document parsing (text extraction) and indexing
  • Luke can be used to view the index and make sample queries
  • Apache Solr (integrated solr-common-*.jar and some classes)
  • Filtering for faceted search
  • Using Spellchecker for "Did you mean?" support
  • OpenSearch (integrated library)
  • Supports inbound and outbound OpenSearch calls
  • Jahia automatically creates OpenSearch tags to be detectable by OpenSearch clients (like Firefox)
  • EntropySoft (library can be integrated via Jahia Unified Content Hub)
  • Allow to search in external repositories mounted in Jahia via the Universal Content Hub
  • Automatically convert queries to EntropySoft query language
     

Query versus search

Jahia offers two distinct tag libraries. The search tags and the query tags. Here is a detailed comparison between those tag libraries to let you find out, which one is better for a specific use case.

  Query-tags Search-tags
Main purpose Retrieving homogeneous objects of one type (e.g. news items, event, etc.) including filtering and sorting by fields Site and document repository search using full text queries and metadata search
Result objects Any content node Hit object, which refers to content nodes, giving access to excerpt, score etc.
Automatic creation of HTML input form No Yes
Can do programmatic search without input form Yes Yes
Search any content object by fulltext search or constraints on metadata fields It is possible to search through all fulltext content of any node type Yes
Limit search to certain content definition Yes No (only via SearchCriteria API)
Limit search to multiple content definitions Only when using mixins or inheritance or by using joins No
Ability to search in entire site Yes Yes
Ability to search in multiple or all sites Yes Yes
Ability to limit search to children or descendants of content objects Yes Yes, but just one node or descendant node constraint is available per search
Automatically dereference linked files (from internal or external mounted document repositories) in content objects within the searched scope No No
Search for files in internal or external document repositories (if supported by connector) Yes Yes
Ability to search in current, specific and multiple languages Yes Yes
Ability to search in current, specific and multiple workflow states Not in multiple Not in multiple
Support of fulltext query syntax as specified in JSR-283 Yes Yes
Ability to build complex queries Yes No
Highlighting search term in search result excerpts No Yes
Set limit for number of search results Yes Yes
Sort by relevance Yes Yes
Sort by fields Yes Yes
Consider ACL to not return hits, which cannot be accessed by current user Yes Yes
Ability to configure returning one or multiple hits per page No Yes (by rules)
Integrated with faceted search/browsing tags Yes No (not yet)
Integrated with frontend HTML cache Yes No
OpenSearch integration No Yes

Queries

You have three alternatives to implement queries:

JCR Java Query Object Model (JQOM) versus SQL-2 versus XPATH

The Java Content Repository (JCR) 2.0 specification (JSR-283) has deprecated the use of SQL and XPATH and replaced this with a specification of the Abstract Query Model (provided implementation is called JCR Java Query Object Model) and the SQL-2 query syntax (similar to SQL). Still Jackrabbit continues to support the XPATH query language, which was specified in version 1.0 of the JCR.

Jahia supports all of them and provides convenient JSP tags for each possibility to ease and speed up development of modules, which use queries.

JQOM and SQL-2 are both based on the new Abstract Query Model (AQM), whereas XPATH is using legacy code in the backend. Therefore you will find functional and performance differences between the three approaches.

Jahia recommends the usage of SQL-2 for hardcoded queries and JQOM wherever a query is built dynamically depending on user input. XPATH should only be used as last resort, as due to its longer existence and optimizations some queries may only work or perform well with XPATH (until Jackrabbit will also tune the queries based on AQM). On the other side AQM allows for more complex queries (like joins), which are not possible with XPATH, but such queries often have some performance penalties.

JQOM

The following example shows how to query all nodes having the mixin type "nt:base" and which are either the current node or a descendant of the current node. The example immediately executes the query and writes the results into the pageContext variable named sitemaps. The resulting object is of type javax.jcr.query.QueryResult, which lets you access the resulting nodes or row objects.

<%@ taglib prefix="query" uri="http://www.jahia.org/tags/queryLib" %>
<%@ taglib prefix="jcr" uri="http://www.jahia.org/tags/jcr" %>
<%@ taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core" %>

<jcr:jqom var="sitemaps">
    <query:selector nodeTypeName="nt:base" selectorName="stmp"/>
    <query:or>
        <query:descendantNode path="${currentNode.path}" selectorName="stmp"/>
        <query:sameNode path="${currentNode.path}" selectorName="stmp"/>
    </query:or>
</jcr:jqom>

<c:forEach items="${sitemaps.nodes}" varStatus="status" var="sitemapEL">
    <c:url value="${url.base}${sitemapEL.path}.html" />
</c:forEach>   

 

You can also first define the queries and store them in a variable and refer to this variable in <jcr:jqom>, as shown in the example. The example defines a query on the base nodetype and depending on the relative property value in the current node, it finds all nodes, which are descendant of the current node or all nodes of the current site. 

<query:definition var="listQuery" scope="request">
    <query:selector nodeTypeName="nt:base"/>
    <query:descendantNode path="${currentNode.properties['relative'].boolean ? renderContext.mainResource.node.path : renderContext.site.path}"/>
    <query:column columnName="rep:facet(nodeType=jmix:tagged&key=j:tags&facet.mincount=${usageThreshold}&facet.limit=${numberOfTagsLimit}&facet.sort=true)" propertyName="j:tags"/>
</query:definition>

<jcr:jqom var="result" qomBeanName="listQuery" scope="request"/>

The column in this case specifies to run Solr based faceting (this is a Jahia specific extension to Jackrabbit) and return the facet counts. You will find more information in the description of the facet module. In this case the query object model is stored into the listQuery variable, which is then used as reference to run the query in the <jcr:jqom>

To learn about the different tags, please look directly at the query tag library (query.tld). We made sure that for all query elements described in the specification of the Abstract Query Model we provide a convenient JSP tag.

You can also create the JQOM queries directly using the API like shown in the specification. For example:

QueryObjectModelFactory factory = session.getWorkspace().getQueryManager().getQOMFactory();
QOMBuilder qomBuilder = new QOMBuilder(factory, session.getValueFactory());

qomBuilder.setSource(factory.selector("jnt:event", "event"));
qomBuilder.andConstraint(factory.descendantNode("event", "/sites/eventTest"));

QueryObjectModel qom = qomBuilder.createQOM();
QueryResultWrapper res = (QueryResultWrapper) qom.execute();

 

So, as you see, using the JQOM requires more code lines, but it is handy if you need to setup dynamic and conditional queries. For hardcoded queries however we recommend using SQL-2.

SQL-2

SQL-2 is a query language, which is also mapped to the Abstract Query Model. So Jackrabbit provides a parser, which will translate the queries and eventually create the JQOM and then both implementations use the same backend query engine in Jackrabbit.

The SQL-2 grammar is heavily based on SQL, with some extensions for hierarchical queries as used in modern content repositories, but some limitations, so that there is common standard support across all Java content repository vendors.

You can learn more about SQL-2 at these sources: SQL-2 Grammar (railroad diagrams)SQL-2 examplesSpecification

The advantage of SQL-2 towards JQOM is that the query definition is short and easier to read, but there is one additional layer as the query needs to be parsed to the JQOM, so there may still also be some hidden bugs, which fail to correctly translate the query. If you have complex dynamic or conditional queries, then JQOM may be the better choice.

Here is an example how to use an SQL-2 query in the modules by using <jcr:sql> and other tags:

<jcr:sql var="receivedMessages"
    sql="select * from [jnt:socialMessage] where isdescendantnode(['${user.path}/messages/inbox']) order by [jcr:lastModified] desc"/>

<ul class="userMessagesList">
    <c:forEach items="${receivedMessages.nodes}" var="userMessage">
        <li id="social-message-${userMessage.identifier}">
            <template:module path="${userMessage.path}" />
        </li>
    </c:forEach>
</ul>     

 

And here is an example of how to use the API to run SQL-2 queries:

QueryManager queryManager = session.getWorkspace().getQueryManager();
String query = "SELECT * FROM [jnt:news] as news WHERE contains(news.*, 'ACME') ORDER BY news.[date]";
Query q = queryManager.createQuery(query, Query.JCR_SQL2);
QueryResultWrapper queryResult = (QueryResultWrapper) q.execute();  

 

XPATH

XPATH has been deprecated as of JCR 2.0, but Jackrabbit and Jahia still support it. View it as last resort, if you cannot make queries work with SQL-2 or JQOM, or if they have bad performance. As XPATH exists much longer and has been optimized a lot, some queries may work much better with XPATH. So you could temporarily implement such queries with XPATH. Jahia also internally implemented some queries with XPATH (e.g. the search tags), as we found that some functionality did not work with SQL-2. Over the time Jahia will also migrate these remaining XPATH queries to SQL-2.

Notice however that when using XPATH queries, you may need to have more knowledge of the way how Jahia stores nodes internally (e.g. translation nodes), as SQL-2 and JQOM queries are automatically overwritten to mask the internal storage structure. Also for JQOM and SQL-2 Jahia is internally modifying the query and adding the current language as a constraint (if no other constraint is existing on the language property). This is not done with XPATH queries, which means that if you do not modify the query or filter the results, you may get duplicated results - one for the main node and one for the translation node, as both are returned by querying on a nodetype.

More information about XPATH query syntax can be found in the JCR 1.0 specification or check this examples.

For using XPATH queries in your JSP modules, you can use the <jcr:xpath> tag similarly as the <jcr:sql> tag in the above example.

Here is an example of how to use XPATH queries with the API:

QueryResult result = session.getWorkspace().getQueryManager()
    .createQuery("/jcr:root"
        + (StringUtils.isEmpty(site) ? "" : "/sites/"
            + JCRContentUtils.stringToJCRPathExp(site))
        + "//element(*, jnt:vanityUrl)[@j:url = "
        + JCRContentUtils.stringToQueryLiteral(url)
        + "]", Query.XPATH).execute();
List<VanityUrl> existingUrls = new ArrayList<VanityUrl>();
for (NodeIterator it = result.getNodes(); it.hasNext();) {
    JCRNodeWrapper node = (JCRNodeWrapper) it.next();
    existingUrls.add(populateJCRData(node, new VanityUrl()));
}

 

Fulltext search expression

The fulltext search expression supported with JQOM and SQL-2 is defined in the JCR 2.0 specification.

This same expression is also supported in the simple search components' input field, where users can enter fulltext search criteria.

Terms can either be single words or phrases within double quotes. Here is a description of the expression syntax from the specification:

  • A term not preceded with "-" (minus sign) is satisfied only if the value contains that term.
  • A term preceded with "-" (minus sign) is satisfied only if the value does not contain that term.
  • Terms separated by whitespace are implicitly "ANDed".
  • Terms separated by "OR" are "ORed".
  • "AND" has higher precedence than "OR".
  • Within a term, each double quote ("), "-" (minus sign), and "\" (backslash) must be escaped by a preceding "\" (backslash).

Jahia's content datamodel

There are some specialities in the way how Jahia stores nodes, mainly in regards of how we store multilingual nodes/properties. Generally this is hidden thus not transparent to the Content manager UI or even when using the JCR API. Also for JQOM and SQL-2 queries Jahia will in the backend modify the query in order to correctly handle multilingual content, but if you use XPATH, you will need to know this data structure.

Basically when a node is based on a node type, which defines multilingual properties (i18n attribute), Jahia will automatically create a subnode of type jnt:translation for each language into which the content node is translated. For instance when a node uses mixins like mix:title or jmix:description, this will already create jnt:translation nodes for each language, because they contain internationalized properties

[mix:title]
  mixin
  - jcr:title (STRING) i18n boost=2.0 

This means that all non internationalized properties will be stored in the main node, and all internationalized properties will be stored in the jnt:translation subnodes (one subnode per language) which will also contain the propertyjcr:language with the locale name (=ISO 639 language code and optionally the ISO 3166 country code delimited with an underscore).

In order to make queries perform fast and to remove the necessity to use joins, we are indexing all non internationalized properties also within the index document of the translation nodes. We made modifications to Jackrabbit's query engine (for JQOM and SQL-2) in order to gracefully handle this design decision and let Jahia module developers and users create queries as if all properties are stored in the main node. Only if you decide to use XPATH you may have to take care about this.

Query using language(s)

If your query does not contain a constraint with the jcr:language property and if the selector node type contains multilingual properties, Jahia will automatically extend the query to return just objects of the current language.

If you want to also query in other languages than the current, you have to add the constraint on jcr:language yourself.

For instance this is an example query to retrieve French news sorted by date:

SELECT * FROM [jnt:news] as news
    WHERE ISCHILDNODE(news, [/sites/mySite/home/page8/news]) AND news.[jcr:language] = 'fr'
    ORDER BY news.[date]

 

Query using site(s)

If you have multiple sites in your workspace and you want to limit the queries to particular sites, you have to use the descendantNode constraint on the site's root node, which is /sites/<yourSiteName>.

For instance if you want to query all news nodes in the current language in the sites myFirstSite or mySecondSite you can use the constraint as in the following example:

SELECT * FROM [jnt:news] as news
    WHERE ISDESCENDANTNODE(news, [/sites/myFirstSite]) OR ISDESCENDANTNODE(news, [/sites/mySecondSite])
    ORDER BY news.[date]

Query using workspaces

Jahia has two workspaces: "live" and "default" (=edit mode). You either have to get the Query Manager from one of these two workspaces and then you either query all published "live" objects or the objects in the edit workspace.

This is done automatically depending on the workspace key in the URL of the request.

Query using categories or tags

Categories and tags are stored as references, so you need to specify the UUID of the category or tag node in the query, like this:

SELECT * FROM [jnt:news] as news
    WHERE news.[j:defaultCategory] = 'adfb06e2-f0bd-45c5-b0e8-15691f6bbd37'

SELECT * FROM [jnt:news] as news
    WHERE news.[j:tags] = 'af0f02f8-1c4a-4026-9d79-4939e373db2c'

 

Usually you will first get the category or tag node and then use getIdentifier() to retrieve its UUID.

Suggested typing

See integration of auto-completion for a description how to use the indexed data for suggested typing.

Performance hints

  • If you could get thousands of result nodes and as the user can not view all of them, always try to set a limit to limit the number of result nodes.
  • If not necessary try to avoid joins until they perform well.
  • Look at the Jackrabbit mailing list archive to get some advice regarding performance of queries

Search and indexing configuration

Repository-level search configuration

Jahia uses Apache Jackrabbit as an underlying JCR-compliant content repository and thus all content objects are indexed and maintained in the Jackrabbit index using the Apache Lucene search and indexing engine.

Jackrabbit uses one index for version store and one index for each workspace (default and live). In the standard Jahia configuration the corresponding index folder are:

  • digital-factory-data/repository/index: version store index
  • digital-factory-data/repository/workspaces/default/index: default (edit mode) workspace index
  • digital-factory-data/repository/workspaces/live/index: live content index

Configuring how the repository data is indexed can be done at several levels. First, you can specify the general behavior of Jahia when it comes to searching and indexing. This is accomplished using the index configuration files and more specifically, its <SearchIndex> section. The main (version store) index configuration can be found at {jahia-web-app-dir}/WEB-INF/etc/repository/jackrabbit/repository.xml, while workspace-specific configuration files are found at digital-factory-data/repository/workspaces/{workspace name}/workspace.xml.

Note that the workspace-specific configuration files are only created after the workspace in question is effectively created. The configuration uses the main index configuration file as template.

Besides the default Jackrabbit search configuration options, which you can use, Jahia uses the following default and custom settings:

<SearchIndex class="org.apache.jackrabbit.core.query.lucene.JahiaSearchIndex">
    <param name="path" value="${wsp.home}/index"/>
    <param name="analyzer" value="org.jahia.services.search.analyzer.DefaultLanguageAnalyzer"/>
    <param name="supportHighlighting" value="true"/>
    <param name="excerptProviderClass" value="org.jahia.services.search.jcr.HTMLExcerpt"/>
    <param name="indexingConfiguration" value="${jahia.jackrabbit.searchIndex.workspace.config}"/>
    <param name="indexingConfigurationClass" value="org.apache.jackrabbit.core.query.lucene.JahiaIndexingConfigurationImpl"/>
    <param name="spellCheckerClass" value="org.jahia.services.search.spell.CompositeSpellChecker"/>

    <!-- This next parameter allows to specify that we want to limit the initially loaded result size -->
    <param name="resultFetchSize" value="100"/>

    <!-- The maxClauseCount setting will need to be increased to perform queries on large sub-trees -->
    <param name="maxClauseCount" value="65000" />

    <!-- Increase the cache size if the doc number cache hits are below 90% -->
    <!--param name="cacheSize" value="100000" /-->

    <param name="enableConsistencyCheck" value="${jahia.jackrabbit.searchIndex.enableConsistencyCheck}"/>
    <param name="forceConsistencyCheck" value="${jahia.jackrabbit.searchIndex.forceConsistencyCheck}"/>
    <param name="autoRepair" value="${jahia.jackrabbit.searchIndex.autoRepair}"/>
</SearchIndex>

 

Parameter Meaning
path Specifies where the index is stored.
analyzer Specifies which Analyzer is used by default. The default configuration uses org.jahia.services.search.analyzer .DefaultLanguageAnalyzer, which attempts to use an Analyzer specific to the language used by default in the Jahia installation. The language-specific Analyzer also filters tokens using org.apache.lucene.analysis.ASCIIFoldingFilter which converts accented characters into their non-accented equivalents.
supportHighlighting If set to true additional information is stored in the index to support highlighting using the rep:excerpt() function.
excerptProviderClass Specifies which class is used to decorate excerpts that are returned by the search engine. By default, we use org.jahia.services.search.jcr .HTMLExcerpt.
indexingConfiguration Specifies where the indexing configuration file is located.
indexingConfigurationClass Specifies which class is used to gather the indexing configuration information. By default, we use org.apache.jackrabbit.core.query.lucene .JahiaIndexingConfigurationImpl, which is a Jahia extension of org.apache.jackrabbit.core.query.lucene.IndexingConfigurationImpl. This allows us to enrich Jackrabbit's default configuration with Jahia specific settings.
spellCheckerClass Specifies which implementation of org.apache.jackrabbit.core.query.lucene.SpellChecker to use to check spelling. By default, Jahia uses org.jahia.services.search.spell.CompositeSpellChecker which is based on Apache Solr and is optimized for sentences. We also added functionality to just take the terms of the current locale and site. The setting of the refresh interval is done by choosing an inner class, like in Jackrabbit's SpellChecker.
maxClauseCount Specifies the maximum number of clauses that can occur for a given query. The default value (if none is specified here is 1024). You might need to increase this value to perform queries on large sub-trees that will expand to many terms.
resultFetchSize Specifies how many results the query handler should initially fetch when a query is executed. We set it much lower than the default to increase performance.
cacheSize Specifies the size of the document number cache. This cache maps node identifiers to Lucene document numbers. Increase the cache size if the doc number cache hits are below 90%.
enableConsistencyCheck Specifies, when set to true, that consistency checks should be performed on the Jackrabbit search indices depending on the value of the forceConsistencyCheck parameter. If set to false no consistency check is performed on startup, regardless of other parameters.
forceConsistencyCheck Specifies, when set to true, that a consistency check for search indices is performed on every startup. If false, a consistency check is only performed when the search index detects a prior forced shutdown if enableConsistencyCheck is set to true.
autoRepair Specifies that errors detected by a consistency check of search indices should be automatically repaired if set to true. Otherwise, an error is logged.

 

Rebuilding indices
In case of an abnormal server process termination or severe I/O error, it is possible for the indices to become inconsistent. Jahia allows you to rebuild the search indices by using the Search Engine Management page of the Jahia Tools. Clicking on Repository re-indexing will force the index to be rebuilt during the next Jahia startup. You can then restart your server to proceed to the re-indexing. More details on behavior regarding search indices consistency can be found on how to check Lucene index.

Fine tuning: Jahia specific indexing configuration

For finer detailed tuning, you can also specify how each node type properties are indexed and handled by the search engine. This is accomplished using the digital-factory-data/repository/indexing_configuration.xml file. Besides the default Jackrabbit indexing configuration options, Jahia uses the following default and custom settings, as detailed below.

Index aggregate on jnt:file

Sometimes it is useful to include the contents of descendant nodes into a single node to ease search on content that is scattered across multiple nodes. Jackrabbit allows you to define index aggregates based on relative path patterns and primary node types. We use such an index aggregate on jnt:file that includes the content of the jcr:content node:

<aggregate primaryType="jnt:file">
    <include>jcr:content</include>
</aggregate>

 

Excluding specific nodetypes from being indexed

To exclude a nodetype from being indexed leads to the result, that the nodetype is not found any more by search. This can be achieved by adding the exclude-indexation rule in the file indexing_configuration.xml which can be found in your digital-factory-data/repository folder.

<exclude nodeType="[nodeType]" path="[path]" isRegexp="[boolean]“ />

This will exclude the matching nodetype, but it will not check the inherited types.

Details about the attributes of the rule:

  • nodetype (mandatory): nodetype to exclude from index
  • path: path of content to be exclude from indexation
  • isRegexp: boolean, false by default, if true the path is a read as a regular expression.

Examples:

<exclude nodeType="jnt:page" />
<exclude nodeType="jnt:file" path="/sites/digitall/files/documents" />
<exclude nodeType="jmix:image" path="^/sites/digitall/files/images/.*" isRegexp="true" />

 

Exclude some node types from search results using search API

Sometime you only want to retrieve some of specific node types using the search. Or sometime you want to exclude some node types to be retrieved as results. To be able to do that, some specific JSP form tag can be use in your search form:

<%@ taglib prefix="s" uri="http://www.jahia.org/tags/search" %>

<.../>

<s:nodeProperty display="false" match="no_exact_property_value" name="jcr:primaryType" nodeType="nt:base" value="jnt:press" />

<.../>

The tag "nodeProperty" allow to add search constraint on custom node properties directly, here we want to filter out "jnt:press" nodes from search results. If we want to get only "jnt:press" nodes we can do:

<s:nodeProperty display="false" match="exact_property_value" name="jcr:primaryType" nodeType="nt:base" value="jnt:press" />

This tag will generate the appropriate HTML hidden input with the good syntax and structure, so that the search API will be able to understand this constraints.

Excluding properties from internationalized nodes

Using the i18ncopy section of the configuration, you can exclude properties from being copied from the main node to the index documents of the translated nodes. This allows us to exclude properties that are not translated so that the indices don't grow too big needlessly.

<i18ncopy>
    <exclude-property>j:tags</exclude-property>
    <exclude-property>j:locktoken</exclude-property>
    <exclude-property>jcr:language</exclude-property>
    <exclude-property>jcr:lastModified</exclude-property>
    <exclude-property>jcr:lastModifiedBy</exclude-property>
    <exclude-property>j:lastPublished</exclude-property>
    <exclude-property>j:lastPublishedBy</exclude-property>
    <exclude-property>j:published</exclude-property>
</i18ncopy>

 

Handling of hierarchical facetable node types

Nodes used for hierarchical faceting need to be specifically handled because we need to index their path and thus we need to reindex the subtree and all the references to the nodes whenever such a nodetype gets renamed, moved, deleted. To specify that a nodetype is used for hierarchical faceting, use the hierarchical section of the configuration as shown below.

<hierarchical>
    <nodetype>jnt:category</nodetype>
</hierarchical>

 

Using Analyzers

Analyzers specify how properties are parsed and analyzed. It is possible to assign a specific Analyzer to handle the content of a particular property. Different Analyzers process the content of properties differently resulting in different search results depending on which Analyzer is used. It is therefore quite important to choose the proper Analyzer for your properties. Luckily, Jahia already specifies appropriate Analyzers for common properties. However, we might want to change the default Analyzer depending on your specific needs and/or specify Analyzers for your custom properties. Please, see Jackrabbit's documentation on Analyzers for more details.

An Analyzer (org.jahia.services.search.analyzer.DefaultLanguageAnalyzer) is configured by default at the repository level. This is the Analyzer that will be used to parse properties that do not specify an Analyzer of their own. This Analyzer is implemented in such a way that it should actually defer to a language-appropriate Analyzer automatically based on which default language is defined in jahia.properties, if such an Analyzer can be found.

Generally speaking, Jahia will attempt to use the most appropriate Analyzer if we can detect a language when looking at the property or during queries, meaning that internationalized properties should use a language-specific Analyzer (if one exists for that particular language) and properties in queries will also be parsed using the detected query language. Naturally, this is performed on a best-effort basis since not all cases can be automatically covered.

To do so, Jahia uses an Analyzer registry which associates a language (via its language code String) to a specific Analyzer. This registry comes configured by default with the following Analyzers:

Language code Associated Analyzer class
ar org.apache.lucene.analysis.ar.ArabicAnalyzer
br org.apache.lucene.analysis.br.BrazilianAnalyzer
cjk org.apache.lucene.analysis.cjk.CJKAnalyzer
cn org.apache.lucene.analysis.cn.ChineseAnalyzer
cz org.apache.lucene.analysis.cz.CzechAnalyzer
de org.apache.lucene.analysis.de.GermanAnalyzer
el org.apache.lucene.analysis.el.GreekAnalyzer
en org.apache.lucene.analysis.snowball.SnowballAnalyzer
(configured as SnowballAnalyzer(Version.LUCENE_30, "English", StopAnalyzer.ENGLISH_STOP_WORDS_SET))
fa org.apache.lucene.analysis.fa.PersianAnalyzer
fr org.apache.lucene.analysis.fr.FrenchAnalyzer
nl org.apache.lucene.analysis.nl.DutchAnalyzer
ru org.apache.lucene.analysis.ru.RussianAnalyzer
th org.apache.lucene.analysis.th.ThaiAnalyzer

 

If you need to support another language or you are not satisfied with the behavior provided by the default mapping, you can change which language is associated with which Analyzer using the analyzer-registry section of the indexing configuration file, as shown below. To add a language/Analyzer mapping, just add an entry using the language code as the element name with a class attribute specifying which Analyzer class to use. The Analyzer class implementation must provide either a no-argument constructor or one with a org.apache.lucene.util.Version argument. You can also specify whether this Analyzer's output will further be filtered using the ASCIIFoldingFilter using the useASCIIFoldingFilter attribute set to true. Finally, you can also specify that your Analyzer will be further customized using an implementation of org.apache.jackrabbit.core.query.lucene.AnalyzerCustomizer using the customizer element, specifying your implementation class using the class attribute. You can pass key/values pairs to your customizer using children of the customizer, each with the name of the key and which content is the value associated to the key. If a key appears multiple times then values will be put into a List. A possible use for AnalyzerCustomizer would be to specify a list of stop-words for example. Please refer to the javadoc for both AnalyzerRegistry and AnalyzerCustomizer for more details.

<analyzer-registry>
    <en_US class="org.apache.lucene.analysis.standard.StandardAnalyzer" useASCIIFoldingFilter="true">
        <customizer class="org.apache.jackrabbit.core.query.lucene.AnalyzerCustomizer$NoOpAnalyzerCustomizer">
            <foo>bar</foo>
            <foo>bar2</foo>
            <key>value</key>
        </customizer>
    </en_US>
</analyzer-registry>

 

If you find that the default behavior is not appropriate, you can configure which Analyzer should be used for your properties using the analyzers section of the configuration file, adding an analyzer entry for the specific Analyzer to use and then add a property entry containing the name of the property this Analyzer needs to handle, as shown below, where we specify that the org.apache.lucene.analysis.KeywordAnalyzer should be used for the jcr:primaryType property.

<analyzers>
    <analyzer class="org.apache.lucene.analysis.KeywordAnalyzer">
        <property>jcr:primaryType</property>
    ...
    <analyzer>
    ...
</analyzers>

Spellchecking options

If you look at the default configuration, you will see that Jahia specifies Analyzers for lots of properties by default. You might also notice that we use a special configuration for a special SPELLCHECK property. There is no SPELLCHECK, this is just Jahia's way to specify which Analyzer should be used to write spellchecked fields into the index. By default, we use Apache's org.apache.lucene.analysis.standard.StandardAnalyzer because the Analyzer used for the spellchecking engine must not perform stemming (as this removes characters from words) and it needs to split words on whitespaces. It also should not use any kind of folding filter (like the ASCIIFoldingFilter) so that language-specific features such as accents are preserved. If you find that the StandardAnalyzer doesn't perform according to your needs for spellchecking, this is where you can change which Analyzer to use.

<analyzers>
    ...
    <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer">
        <property>SPELLCHECK</property>
    <analyzer>
    ...
</analyzers>

Jahia has the ability to specify that only some properties should be spellchecked. This is accomplished using the spellchecker section of the indexing configuration, using an include-property entry for each property to include. Below, we show how to only consider the jcr:title property during spellchecking.

<spellchecker>
    <include-property>jcr:title</include-property>
</spellchecker>
Important:f a spellchecker section exists, only the properties defined using an include-property entry will be considered for spellchecking so you must list all the properties you want to be spellchecked here if you decide to use this feature. To maintain backward compatibility, if no spellchecker section is found, then all properties might be spellchecked.

Indexing and search options at the Compact Node Definition (CND) level

Finally, some indexing options can be specified directly when the node type is declared. This is particularly convenient to specify some of the indexing behavior directly on properties declarations. Below are some options you can use when declaring your node types using Compact Node Definition (CND) files.

Disabling indexing completely

If you don't want a property to be indexed for any reason, specify indexed=no in its definition.

In the following example, the date property of the jnt:publication node type won't be indexed at all:

[jnt:publication] > jnt:content, jmix:editorialContent, mix:title, jmix:structuredContent
...
 - date (string) i18n indexed=no
...

Disabling full text indexing

If a particular property doesn't have any specific meaning for full text searches, it is recommended to exclude it from being indexed for full text search by attaching the nofulltext option to the property's definition.

The following example shows how this option is used to exclude a Flash object width and height properties from being index for full text searching:

[jnt:flash] > jnt:content, mix:title, jmix:multimediaContent
 - flashSource (weakreference, picker[type='file'])
 - width (string) analyzer='keyword' nofulltext mandatory
 - height (string) analyzer='keyword' nofulltext mandatory
...
Note, please, that using the nofulltext option does not mean that the property is not indexed and cannot be used in JCR queries. In order to completely prevent a property from being indexed, use indexed=no in its definition as previously shown.

Boosting a property's appearance in search results

If you want to increase the relevance of a given property in search results, use the scoreboost (or boost or b) option in its definition. This option takes a value between 1.0 (the default value) and 5.0, higher values meaning greater relevance. If you do not specify this option, we will also check the default indexing configuration. See Jackrabbit index rules for more info.

In the following example, the jcr:title property is boosted for relevance in search results:

[jnt:publication] > jnt:content, jmix:editorialContent, mix:title, jmix:structuredContent
...
 - jcr:title (string) mandatory scoreboost=2.0
...

Enabling faceted navigation

You can enable faceted navigation on a property by adding the facetable option, as shown below:

[jnt:job] > jnt:content, mix:title, jmix:editorialContent, jmix:structuredContent
 - reference (string)
 - businessUnit (string) i18n facetable
 - contract (string, choicelist[resourceBundle]) facetable  < contract1, contract2, contract3, contract4
 - country (string, choicelist[country,flag]) facetable
...

Deprecated legacy options

The queryops, noqueryorder and analyzer options that were previously supported in previous Jahia xCM versions are not supported anymore. To specify different analyzers per field, please use the indexing configuration file as explained in Jackrabbit analyzers configuration.

Facets

Faceted search has become a popular technique and very nice way to help narrowing down search results or improve the navigation of a site. With facets you define how to classify content objects or query results. You can display all or the most interesting values within the classification, which immediately show the count of resulting objects, if the query would be narrowed by including the faceted value as filter.

Apache Jackrabbit does not yet provide out-of-the-box faceting support. In order to provide this functionality for our users, the Jahia team looked for ways to integrate Apache Solr - one of the most popular Apache Lucene based open source solutions offering faceting support - into Jackrabbit. This turned out to be a difficult endeavor as Solr was not designed to easily be integrated with other frameworks, in particular to work with indices that were not created with Solr. We nevertheless were able to implement a solution, in the form of the SimpleJahiaJcrFacets class, to bridge Jackrabbit and Solr. SimpleJahiaJcrFacets is very similar to Solr's SimpleFacets class and reuses classes from solr-common to work with exactly the same facet query syntax and result objects as Solr does. It is therefore highly recommended to learn more about Solr faceting to work efficiently with facets in Jahia.

Support for faceting in Jahia is provided by the Facets module. It is therefore important to activate the module for your site if you intend to use facets.

Configuring properties for faceting

Simple faceting

Node properties that are used for faceting will, in most cases, require a second indexing, different from the normal one, to support faceting. This is needed because analyzers usually perform stemming, lowercasing and tokenizing before indexing a field. However, in order for a value to be useful for faceting, it:

  • should not be tokenized into separate words
  • should not be lowercased
  • should not have punctuation removed
  • should not be stored in the index

Jahia extended the Compact Node Definition (CND) format to make it possible to easily specify that a given property should be used for faceting. This is accomplished by adding the facetable attribute to the property you want to mark for faceting indexing. Below is an example, where the startDate, endDate, location and eventsType properties are marked as facetable and will therefore be indexed specifically for faceting (in addition to the regularly performed indexing): 

[jnt:event] > jnt:content, jmix:editorialContent, mix:title, jmix:structuredContent
- startDate (date) facetable
- endDate (date) facetable
- location (string) i18n facetable
- eventsType (string, choicelist[resourceBundle]) facetable  < meeting,consumerShow,roadShow,conference,show,pressConference
- body (string, richtext) i18n 
Note that if you decided to make a property facetable after content has already been created for that particular nodetype, you will need to rebuild the indices as explained in the Rebuild indices section.

Facetable fields are prefixed by FACET: in the index.

Hierarchical faceting

Jahia supports hierarchical field value faceting. Hierarchical faceting, as its name implies, means that facets can leverage existing hierarchy in the data to offer more intuitive filtering. Using hierarchical faceting, you can filter your data to see items at a specific hierarchical level along with all of their children. For example, when you filter on a specific category, you will not only get nodes categorized with exactly this category but also nodes categorized with any of its children category.

Marking a property as being a target for hierarchical faceting is as easy as using the hierarchical attribute for this property in the associated nodetype definition. Here is how this is done for the jmix:categorized mixin containing the hierarchical facetable field j:defaultCategory:

[jmix:categorized] mixin
extends = nt:hierarchyNode, jnt:content, jnt:page
itemtype = classification
- j:defaultCategory (weakreference, category[autoSelectParent=false]) facetable hierarchical multiple

Facet types

As Jahia leverages Apache Solr for faceting, it supports all the facet types (field, date, range, queries) available in Apache Solr. See Facet types and their parameters to get a description of all supported facet types and parameters.

Operation

Jahia not only offers backend access to run facet queries, but also provides a facet module with several components to help in creating templates using faceting support simply with drag and drop in the Studio.

Facet types and their parameters

Jahia supports the same facet types supported in Solr 1.3, which are also described in SimpleFacetParameters, where you will find examples and more details about the different types and their parameters supported in that Solr release.

We are now going to first describe the parameters (properties), which are common to all facet types, and then describe the specific facet types and their parameters. The parameters are useful.

Common parameters for all facet types

field (in the query: key)

This is an id, you can choose for the facet classification. Elements with the same facet id will be grouped together.

For field and date facets, you will mostly have just one facet element, which will automatically create all the different facet values.
For range and query facets, you will probably have many elements with the same facet id, but different query constraints, where each query results in a facet value. In future versions, the range facet will become more powerful and you will no longer have to create one element per facet value.

This is a mandatory field.

label

This is a multi-lingual property holding the label for the facet, which will be displayed as facet classification title above the facet values.

field (in the query the nodetype is set with nodetype, but the property name is passed in the column name)

This property allows you to specify a field, which should be treated as a facet. In Jahia this is a property definition in a nodetype (e.g. jnt:event;eventsType or jmix:categorized;j:defaultCategory).

For field facets, we will iterate over all terms in that field and generate a facet count using that term as constraint.
For date and range facets, we will count how many values in that field are matching the range.
For query facets, the property is not mandatory as the arbitrary query can be based on multiple fields, but if the query is based just on one field, it is recommended to also specify it in this property.

mincount

This parameter indicates the minimum counts for facet values, which should be included in the response.

The default value in the Solr backend is 0, whereas when using the Jahia user interface, the default will be 1, which means that facet values resulting in no hits will not be returned and displayed.

labelRenderer

Sometimes the values of a field may be simple keys, which are meaningless to the end user. For that you can register ChoiceListRenderer implementations into Jahia. Set the key to the ChoiceListRenderer implementation into the labelRenderer field in order to display nice labels.

Field Value Faceting

In Solr, the fields had to be defined in a schema. With Jahia, there is no need to create a Solr schema, because Jahia already defines nodetypes and properties, like it is used for Java content repositories.

So for field value faceting, you simply select a property definition in a nodetype and then the facet values will simply be created from the terms indexed for that field. Notice that in most cases you will need to add the attribute facetable in the property definition, so that the values will be indexed in a way, which is required for faceting.

Jahia supports hierarchical field value faceting. With categories, for instance, it is thus possible to drill down a category tree.

In addition to the common parameters, field value faceting also supports the following parameters:

sort

This parameter determines the ordering of the facet values.

  • true - sort the constraints by count (highest count first)
  • false - to return the constraints sorted in their index order (lexicographic by indexed term). For terms in the ascii range, this will be alphabetically sorted.

Notice that in the engine we already provide mapped text descriptions instead of true and false. But when using the API or setting up facet queries yourself, you need to use true or false until we upgrade to a newer Solr version where this has been changed.

prefix / Root path

Limits the terms on which to facet to those starting with the given string prefix.

On field hierarchical facets, this parameter is used to drill down on the levels of a hierarchy.

limit

This parameter indicates the maximum number of facet value counts that should be returned for the facet fields. A negative value means unlimited.

The default value is 100.

offset

This parameter indicates an offset into the list of constraints to allow paging.

The default value is 0.

missing

Set to true this parameter indicates that in addition to the term based constraints of a facet field, a count of all matching results which have no value for the field should be computed

The default value is false.

Date Faceting

For date faceting, you also need to choose a property definition of type date from your node-type definitions in Jahia. For these fields it does not make much sense to facet on the terms, but the facets are more interesting on ranges. Several parameters can be used to trigger faceting based on Date ranges computed using simple DateMathParser expressions. You can also look at Solr Date Format to have more expression examples.

When using Date Faceting, the start, end and gap parameters are all mandatory.

date.start

The lower bound for the first date range for all Date Faceting on this field. This should be a single date expression which may use the DateMathParser syntax.

date.end

The minimum upper bound for the last date range for all Date Faceting on this field (see hardend for an explanation of what the actual end value may be greater). This should be a single date expression which may use the DateMathParser syntax.

date.gap

The size of each date range expressed as an interval to be added to the lower bound using the DateMathParser syntax.

Example: gap=+1DAY

date.hardend

A Boolean parameter instructing Solr what to do in the event that gap does not divide evenly between start and end. If this is true, the last date range constraint will have an upper bound of end; if false, the last date range will have the smallest possible upper bound greater then end, such that the range is exactly gap wide.

The default is false.

date.other

This parameter indicates that in addition to the counts for each date range constraint between start and end, counts should also be computed for...

  • before all records with field values lower than lower bound of the first range
  • after all records with field values greater than the upper bound of the last range
  • between all records with field values between the start and end bounds of all ranges
  • none compute none of this information
  • all shortcut for before, between, and after

In addition to the all option, this is a multiple choice parameter -- but none will override all other options.

date.labelFormat

In order to print nice labels for the dates representing the facet values, this property is here to define how to format the date object into a label. The patterns, which can be used are the same as the ones used and described in the SimpleDateFormat class.

Example: MMMM yyyy=April 2021

Facet by Range

Jahia provides its own range support, which is more similar to the query facet. It means that you will need to create multiple elements with the same facet id, all having different start and end parameters for the different facet values. Range queries only work when used via the Jahia Facet module and not when directly calling the backend.

These are the range specific parameters:

range.start

The lower bound of the range.

range.end

The upper bound of the range.

range.include

The parameter can be a combination of the following options (more to come in future versions):

  • lower = the range includes the lower bound
  • upper = the range includes the upper bound
  • all = shorthand for lower, upper

range.valueLabel

This is a multilingual property, in which you can set the label, which should be displayed for this facet value.

Arbitrary Query Faceting

Last but not least you can also specify any arbitrary query in Lucene default syntax to generate a facet count. This type is the most flexible, but also the most complex and sophisticated for the template/page designer, who is setting up the facets, as he needs some technical knowledge and skills. The other facet types (field, date and range) are more user friendly and provide more logic out-of-the-box. But sometimes there are more complex requirements, for instance when a facet is comprised of two properties, like is the case in the above jnt:event example. If you want to provide a date facet showing all events per month, you need to query on the startDate as well as on the endDate.

query

This parameter holds the query in Lucene syntax to generate a facet count.

The syntax of Lucene queries is described on the Query Parser Syntax WIKI page.

Now for using fields in the queries, you will need to find out the exact field names used in the index. For that you should install Luke and open the repository index, which you will find under \WEB-INF\var\repository\workspaces\default\index . Click on the Documents tab and then you will see the Term combobox with all the available field names. facetable properties are prefixed with FACET:. Notice that the names contain special characters, which need to be escaped with a "\" according to Escaping Special Characters.

For date range queries, you can also use the dynamic Solr Date Format or DateMathParser expressions, like in the Date faceting.

Example for retrieving events from now until the end of the current month:
0\:FACET\:startDate:[NOW/DAY TO NOW/MONTH+1MONTH] OR 0\:FACET\:endDate:[NOW/DAY TO NOW/MONTH+1MONTH]

valueLabel

This is a multi-lingual property in which you can set the label, which should be displayed for this facet value. For the label, you can also make use of Jahia macros. For instance, if you used the dynamic DateMathParser expressions, you can for the above example to print the current month use the macro December . The first parameter is the date expression and for the optional second and third parameter, we use the same FormatType and FormatStyle like used and described in the Java MessageFormat class.

Using faceting in the Jahia backend

Integration of Solr faceting in Jackrabbit

Jahia extends Jackrabbit and allows the use of the custom functions rep:facet() and rep:filter() within SQL-2 or QOM queries. These functions are not (yet) supported for the deprecated XPATH queries.

Using the API

The following code snippet shows how to create a query using the query object model (QOM) and specifying two facets: one on the eventsType and the other on the startDate. The parameters within the rep:facet() function are described in Facet types and their parameters. You can prefix the parameters with facet., if you do not, then Jahia will do it internally.

QueryObjectModelFactory factory = session.getWorkspace().getQueryManager().getQOMFactory();
QOMBuilder qomBuilder = new QOMBuilder(factory, session.getValueFactory());

qomBuilder.setSource(factory.selector("jnt:event", "event"));
qomBuilder.andConstraint(factory.descendantNode("event", "/sites/jcrFacetTest"));

qomBuilder.getColumns().add(factory.column("event", "eventsType", "rep:facet(facet.mincount=1&key=1)"))
qomBuilder.getColumns().add(factory.column("event", "startDate", "rep:facet(facet.mincount=1&date.start=2000-01-01T00:00:00Z&date.end=2000-03-01T00:00:00Z&date.gap=+1MONTH&key=2)"))

QueryObjectModel qom = qomBuilder.createQOM();
QueryResultWrapper res = (QueryResultWrapper) qom.execute();

From the result object, you can then retrieve the facet values for each facet, like this (the example is simply printing the facet values into the output console):

field = res.getFacetField("eventsType");
for (FacetField.Count count : field.getValues().iterator()) {
  System.out.println(count.getName() + ": " + count.getCount());
  }

In order to apply a facet value as a filter to narrow down a query you have to use count.getFilterQuery() within a rep:filter() function constraint, like this (for multiple filters you simply add more with calling more andConstraint methods):

QOMBuilder qomBuilderForDrilldown = new QOMBuilder(factory, session.getValueFactory());

qomBuilderForDrilldown.setSource(factory.selector("jnt:event", "event"));
qomBuilderForDrilldown.andConstraint(factory.descendantNode("event", "/sites/jcrFacetTest"));

qomBuilderForDrilldown.getColumns().add(factory.column("event", "startDate", "rep:facet(facet.mincount=1&date.start=2000-01-01T00:00:00Z&date.end=2000-03-01T00:00:00Z&date.gap=+1MONTH&key=2)"))

qomBuilderForDrilldown.andConstraint(factory.fullTextSearch("event", "rep:filter("
             + Text.escapeIllegalJcrChars(count.getFilterQuery()) + ")",
             factory.literal(session.getValueFactory().createValue("eventsType"))));

QueryObjectModel qomForDrilldown = qomBuilderForDrilldown.createQOM();
QueryResultWrapper drilldownRes = (QueryResultWrapper) qomForDrilldown.execute();

In web applications, there will obviously be an interaction step in between where the user will choose a facet value. This means that the filter-query either needs to be passed to the client and back, or you find some caching solution (e.g. in the session). Jahia's facet component passes the filter-query to the browser by encoding it, which reduces its size and hides the real names of the fields used in the Lucene index from the end-user. The encoding is done with org.jahia.utils.Url.encodeUrlParam and the decoding with org.jahia.utils.Url.decodeUrlParam.

Using the JSP tags

For displaying the facets and facet values with the right labels and to create the URLs (with the mentioned encodings) to drilldown facet values, and to display the currently applied facet values to the query and to create the right URL to remove an applied facet, Jahia provides convenient JSP tags and functions, which you will find in the Jahia taglib documentation.

Facet module and components

Jahia provides a facet module, with a Facets list and a Site Tag cloud component, which makes use of the faceting support in the backend. With this, an integrator can create templates with faceting support in next to no time.

Facets list

This component can be found under Site components>Facets list.

If, for instance, you want to add a Facets list on the News template, simply open the Studio, select your template-set and then navigate to the news template in the left side-panel. Find the Site components>Facets list component and drag and drop it to the right column area. In the popup, click on Save.

Now you need to bind the component to the area, which displays the list or the query results. Click on the Click this to link button. Release the mouse button (no drag and drop) and move the mouse pointer (should display a chain icon) to the Maincontent area and click on the area. Now you should see the message facets - Linked to: maincontent in the component. Notice that when you drop the component in edit mode, the component will show that it is automatically bound to "main resource". But in order to make the facet component work, it should not be bound to the main resource but to the resulting list component, so you need to change the automatic bonding.

If you bind the facets list to a list component, where you provide an own main view (e.g. jsp), you may see problems in live mode: facet filters might not work correctly, the list might not get filtered and display all items in the list. This is because the Jahia cache system needs to be instructed that this list may get filtered by query parameters in the request, so these parameters should be part of the generated cache-key. The request query parameters in the facet module start with N-. In order to instruct the Jahia cache system of this filter for your list, you have to create a properties file next to the custom main view (see cache configuration section) and add this property:

cache.requestParameters=N-*

Now let's create a facet on the news date. For that, we first need to make the property facetable. Modify the definition of jnt:news and add the attribute facetable to the date property.

[jnt:news] > jnt:content, mix:title, jmix:editorialContent, jmix:structuredContent
- jcr:title (string) i18n mandatory
- desc (string, richtext) i18n
- image (weakreference, picker[type='image'])
- date (date) = now() facetable   

After that, you need to re-index the site. So you have to shutdown the server and manually delete all indexes, as described here. Start the server and continue working in the studio modifying the news template. On server restart, the index is recreated.

Now in the dropped facets list component select the button in Add to facets: Abstract facet. In the popup choose the Date facet.

In order to display a facet with news starting from 1.1.2021 until now grouped by month with a label of "month-name (year)", fill out the form with the following values (more info about all parameters can be found here):

facet: date
label: Date
field: Date (News entry)
mincount: 1
start: 2021-01-01T00:00:00Z
end: NOW
hardend: true
format: MMMM (yyyy)

 

Click on Save.

Now you can deploy the template, by choosing in the top-menu Deploy templates>ACME.

Go to the news page in for instance Preview mode (or reload the page if you already had opened it in parallel) and you should see the facet in the right area.

Now when you select a facet value, it will be included as filter when querying the list in the mainarea and you will see the selected facet in the active facets area, where you can again remove it, if you want.

Drill-down urls

For applying facet values as filters, we are using a query parameter named "N-boundcomponent.name" and we are encoding the value of the parameter as for query facets the parameter is a Lucene query string. For some customers it could be a security hole if end-users would see the Lucene query and try to modify it, in order to get access to other data. If you would like to have shorter URLs, you could also implement a mapping of certain keys to facet values or queries on your own.

Applying and displaying facets

The applied facets are acting as filters on top of the regular query creating the list. This functionality is implemented in Jahia's default module in the file jmix_list\html\list.hidden.header.jsp, where the list and the facets are getting loaded.

The display of the applied facet values and the remaining facets is then implemented in the facets module's jnt_facets\html\facets.jsp.

Site Tag cloud

The second out-of-the-box component using faceting in the backend is the tag cloud, which can be found under Site components>Site Tag cloud.

This component can either be dropped to a page in Edit mode, or to a template in the Studio, but if you do the latter, you will have to still link the site's search results page to the component in Edit mode.

Setting the search results page is needed to create the URLs on the tags in the cloud, so that they point to a proper page, which is able to display the resulting components using the selected tag. Besides this setting you can also specify the threshold to define the minimum number of references a tag must have for being displayed in the cloud. In the limit property you set the maximum number of tags displayed in the cloud.

In order to use the tag cloud in the ACME demo, you first need to apply some tags to content objects. The font size of the tags in the cloud depends on the number of times a tag is used in the site. The bigger the font, the higher the usage and the number of content objects in the result list, when clicking on the tag.

The tag cloud component is implemented in the facets module's jnt_tagCloud\html\tagCloud.jsp.