Optimizing URLs to improve SEO and user experience

November 14, 2023

URL rewriting

Out of the box Jahia adds support for the improved SEO and user experience through various way of URL rewriting. This includes the following features:

  • URL mapping - allows providing mappings (user-friendly URLs, vanity URLs etc.) for frequently used or so called common resources, e.g. the last news page, company's profile, contact page, special offers etc. It also eases a web site or content migration activities, e.g. in case an old resource is replaced with the new one, but the URL locations should remain the same for preventing "broken links" for bookmarks and search engines.
  • SEO URL rewriting - implies URL "shortening" by removing "technical" information (activated by default).
  • Controlling browser caching - through the URL rewrite engine rules Jahia also "controls" the client side (Web browser and proxy server) caching of the pages and static assets.
  • Forcing secure browsing for logged in users - there is an option for "forcing" a switch to an SSL (HTTPS protocol) for a user session

This section provides an overview of aforementioned features and possible use cases.

As an introduction a short description of the URL format definition in Jahia and rewriting rules scope is given in the next sections.

When the URL rewriting rules are referred to, the UrlRewriteFilter-based rule definitions are meant here.

The loading of rules is done for the following locations (using Ant-style patterns):

  • org/jahia/services/seo/urlrewrite/urlrewrite.xml file (packaged into WEB-INF/lib/jahia-impl-*.jar JAR) - core Jahia rules
  • WEB-INF/etc/config/urlrewrite-*.xml
  • modules/**/META-INF/urlrewrite*.xml

And for SEO-related rules in:

  • WEB-INF/etc/config/seo-urlrewrite*.xml
  • modules/**/META-INF/seo-urlrewrite*.xml

And finally for rules that need to be executed as last in:

  • org/jahia/services/seo/urlrewrite/last-urlrewrite.xml file (packaged into WEB-INF/lib/jahia-impl-*.jar JAR) - core Jahia rules
  • WEB-INF/etc/config/last-urlrewrite-*.xml
  • modules/**/META-INF/last-urlrewrite*.xml

This means that the URl rewriting rules can be also defined in custom modules.

URL format

The following is a "full" URL of a resource in Jahia, omitting the protocol, port and context path:

/cms/<rendering-mode>/<workspace>/<language>/<path>[.<template>].<template-type>

Where possible values for <rendering-mode> are:

  • render - preview mode
  • contribute - simplified editing mode
  • edit - full editing mode
  • studio - content studio mode

and <workspace> can be one of:

  • default - staged content
  • live - published content or content, created in the live mode directly

Some examples::

  • /cms/edit/default/en/sites/ACME/home/about-us.html
  • /cms/render/live/en/users/root.user-content.html - user-content is a rendering template here.

Rewriting scope

It is important to understand the scope the different URL mappings or URL rewriting is applied to.

The one exception to all rules is the case where the server is accessed using a localhost domain names. This mode is targeted for development or quick prototyping phases and in such cases neither URL mappings nor SEO URL rewrite rules are applied to the URLs and all links remain unchanged.

The following table summarizes the applicable conditions for URL rules, in case a domain name other than localhost or an IP address is used:

Rules/Mode Studio Edit Contribute Render/Preview Live
URL mappings (vanity URLs) no no no yes yes
SEO URL rules no no no no yes
 


Thus, in the Studio, Edit and Contribute modes the URL mappings and URL rewriting are not applied to make navigation and editing more consistent and predictable.

 

URL mapping

URL mappings are designed to provide language-dependent (if the site has multiple languages) custom URL (URL part) for a content resource.

Consider the following case, a HR department publishes new "Internship" page and would like to promote its URL in a more user-friendly way.

The internal page URL is e.g.:

http://127.0.0.1:8080/cms/render/live/en/sites/ACME/home/about-us/jobs/internship.html 

A HR manager adds the following URL mapping for the newly created internship page in English to be /internship and for the German (company site is multilingual with English and German versions) the /praktikum:

vanity-url.png

So the URLs for English and German versions of that page get "shortened" to:

http://127.0.0.1:8080/cms/render/live/internaship

http://127.0.0.1:8080/cms/render/live/praktikum

i.e. the language and path of the page are replaced with the custom URL - /internaship, that was defined.

The rest of the URL - the "technical" part - can be automatically removed by activating SEO URL rewriting as described further in this chapter.

Another use case could be in case of a content migration from a previous version of the site platform, say using an ASP.NET CMS system, and keeping important links still working and not leading to 404 - "Page not found" errors.

The old page was accessible under e.g. "about.aspx" name. Creating for a new About Us page the following mappings:

vanity-url-migration.png

Will still allow the old URL to /about.aspx to work, but a client-side redirect will be sent to the browser to /about, as this mapping is declared as the default one.

This follows the "single URL for resource principle" for the improved SEO. Nevertheless this redirection can be deactivated in digital-factory-config/jahia/jahia.properties (for versions, prior to 7.0.0.2 - WEB-INF/etc/config/jahia.properties) by setting the following flag to false:

# This parameter will control, that if vanity URLs exists for a node and if
# it has been accessed with a non-default vanity URL, we inform the client that
# the resource has permanently moved (HTTP status code 301)
permanentMoveForVanityURL                              = false

SEO URL rewriting

Jahia provides ready-to-use set of URl rewriting rules, located in the WEB-INF/etc/config/seo-urlrewrite.xml file, which are targeted to remove the "technical" part from the URLs to Jahia resources when rendering in Live mode.

The SEO URL rewriting rules are enabled by default and can be disabled by setting the following flag value to "false" in the digital-factory-config/jahia/jahia.properties (on versions prior to 7.0.0.2 - WEB-INF/etc/config/jahia.properties): 

# This option enables the URL rewrite engine to "shorten" content URLs
# in live mode, .e.g.:
#     http://my.acme.org/cms/render/live/en/site/myAcme/home.html
# can become:
#     http://my.acme.org/home.html
urlRewriteSeoRulesEnabled                              = false

Additionally, the removal of /cms is achieved by activating the following setting in digital-factory-config/jahia/jahia.properties:

# If set to true, the /cms prefix will be also removed from URLs
# Note, this option is only valid if the SEO URL rewriting is activated,
# i.e. urlRewriteSeoRulesEnabled is set to true
urlRewriteRemoveCmsPrefix                              = true

Note that a flush of Jahia output caches is required after changing settings of SEO rewriting.

If both options are activated it allows in most of the cases to "reduce", for example, the following URL:

http://my.acme.org/cms/render/live/en/sites/global/home.html

to:

http://my.acme.org/home.html

assuming that the global site has the my.acme.org defined as a Web project host name in site properties and English is the default language of the site.

For non-default site-language, the language remains in the URL after rewriting. E.g. for a German home page of the same site the URL will be:

http://my.acme.org/de/home.html

Browser caching

Controlling browser caching

The client side cache for pages and static assets (CSS, JavaScript etc.) is controlled using URL rewrite engine rules, defined in the org/jahia/services/seo/urlrewrite/last-urlrewrite.xml file (packaged into WEB-INF/lib/jahia-impl-*.jar JAR):

   <!-- Client-side caching -->
    <rule>
        <name>Set no cache headers</name>
        <note>Resources that should not be cached</note>
        <from>^/(welcome.*|start|validateTicket|administration.*|cms/.*|engines/.*\.jsp(\?.*)?|tools/.*|gwt/org\.(.*)/.*\.nocache\..*)$</from>
        <set type="response-header" name="Expires">Wed, 09 May 1979 05:30:00 GMT</set>
        <set type="response-header" name="Cache-Control">no-cache, no-store, must-revalidate, proxy-revalidate, max-age=0</set>
        <set type="response-header" name="Pragma">no-cache</set>
        <set type="request" name="jahiaCacheControlSet">done</set>        
    </rule>
    <rule>
        <name>JCR files</name>
        <note>Do not set any cache expiration for JCR files</note>
        <from>^/(files|repository)/.*$</from>
        <set type="response-header" name="Cache-Control">public, must-revalidate, max-age=1</set>
        <set type="request" name="jahiaCacheControlSet">done</set>        
    </rule>
    <rule>
        <name>Set cache expires headers</name>
        <note>Cache all other resources by default</note>
        <condition type="header" name="Expires" operator="equal">^$</condition>
        <condition type="attribute" name="jahiaCacheControlSet" operator="equal">^$</condition>
        <set type="expires">1 month</set>
        <set type="response-header" name="Cache-Control">public, max-age=2678400</set>
    </rule>
    <!-- end of client-side caching -->

Those rules can be extended to achieve fine-grained tuning of the caching strategy.

Forcing secure browsing for logged in users

There is an option for "forcing" a switch to an SSL (HTTPS protocol) for a user session, from login to logout. This allows sites with higher security concerns to force secured connections for logged in users.

For details see Enabling SSL for logged in users.