Managing sitemap files

November 14, 2023
Note: The following documentation applies to Jahia-sitemap module, in version 4+ and compatible with Jahia 8.0.3+

What are the sitemap files? 

A sitemap is a file where you provide information about the pages of your site to search engines. For more information about sitemap in general, you can refer to Google documentation on sitemap. The sitemap module automatically generates the sitemap files for you.

How does Jahia help you with sitemap?

Jahia provides a dedicated module to manage sitemap files. You can contact the administrators or developers you're working with to deploy that module to your platfom. The module is available on jahia store

Once the module is installed and activated on your site, refresh your jahia page and go to jContent => Additionnal => SEO => Sitemap. 

Understanding which sitemap files will be generated

When the sitemap module is enabled, several sitemap files will be provided: 

  • Main sitemap file: located at the root of the site: <mydomain>/sitemap.xml, this file is listing the language and the dedicated sitemap files. It is not listing any page, only referencing other sitemap files.
  • Language sitemap files: located at <mydomain>/cms/render/live/<lang>/sites/<mysite>/sitemap-lang.xml, there is one language sitemap file per language active in live on the site. For the default language, the file is located at <site>/sitemap-lang.xml. 
  • Dedicated sitemap files: If a page has been marked as "dedicated sitemap", then this page and its subpages (pages below in the tree) will be listed in a dedicated sitemap file. This file will be referenced from the main sitemap. This is useful for managing the size and organization of available sitemaps.
    Dedicated sitemap files are located at <mydomain>/cms/render/live/<lang>/sites/<mysite>/<path-of-page>/sitemap-lang.xml

Below is an example of which sitemap files could be generated for jahia.com, assuming there is a dedicated sitemap for the blog entries:

jahia-cms-sitemap-new.png

Understanding the content of the sitemap files 

  • Content types: Jahia pages (jnt:page) and content items marked with the mixin "jmix:mainResource" will be listed in the sitemap file. If some content items that can be displayed in full page are not listed in the sitemap file, ask the developers you're working with to verify if the jmix:mainResource is correctly added to the related content type.
  • Live role / Accessible to "guest": Only pages that are accessible to "guest" (public pages, not protected by login in live) will be listed in the sitemap files.
  • Valid display language:  If a page has been defined as not visible for a given set of languages, the page won't be listed in the sitemap of these languages.

For each page, a <url> xml markup is added, with the following information:

  • <loc>: the url of the page that is UTF-8 encoded and entity-escaped. Default vanity URL will be used for a given sitemap resource if it exists
  • <lastmod>: the last modification date of the page, in W3C format
  • For each language active on the site, the alternate language URL is also included using <xhtml:link rel="alternate"> tag.
     

Activating the sitemap module

jahia-sitemap-activate.png

In the field "sitemap root URL", enter the base URL of your site, i.e. https://www.amazon.com 
Click "Activate" at the top right.
The sitemap module is now active for your site. 
The sitemap file will be generated and cached when the URL <sitemap root URL>/sitemap.xml will be accessed. 

Understanding how sitemap files are generated and cached

Sitemap files are generated when the URL of the sitemap files is accessed. Once the file has been accessed, it is then cached during several hours. This means that if you're making some changes to your web pages (creating new, deleting, moving), these changes won't be reflected until the cache expires. This behaviour is required to avoid any performance issues when accessing the sitemap file. If you want to make sure that search engines access the latest version of the sitemap files, click "Flush cache" in the toolbar of the header.

Ensuring a page isn't listed in the sitemap

When editing a page or a content with jmix:mainResource, you'll find in the SEO section a mixin called NOINDEX. This mixin will both add the metadata to the HTML head <meta name="robots" content="noindex"> and ensure that the page isn't listed in any sitemap file.

jahia-sitemap-no-index.png

Creating a dedicated sitemap

To ensure that a section of a website (blog for instance) has its own sitemap, you can select the mixin "dedicated sitemap" in the SEO section.

jahia-sitemap-no-index.png

Comparison of Sitemap v2 and v3 with Sitemap version 4+

 

Sitemap v2 and v3

Sitemap v4 and above

Use cache to ensure stability and good performances

Yes

Yes

UI to flush cache and manually submit

No

Yes

Adding a page to a sitemap

Manually activate the mixin on each page and content.

Pages and content defined as "main resource" (viewable in full page) are automatically added. 

 

Note: Pages that are not accessible to guest (protected by login) are not added to the sitemap file.

Sitemap per language

No, single sitemap file

Yes (referenced from a main sitemap file)

List alternate version of the page (list other languages available)

No

Yes 

Dedicated sitemap file for a section

No

Yes (referenced from a main sitemap file)

Debugging sitemaps

Starting with sitemap v5.0.0, a new sitemap.debug parameter has been introduced in sitemap configuration file. 

When enabled, additional data about each node (path, URL,, type, uuid) will be added, as XML comments block, in the generated sitemap file.

<url>
 <lastmod>2023-11-03</lastmod>
 <loc>http://localhost:8080/jahia/sites/digitall/home/mypage.html</loc> 
 <xhtml:link href="http://localhost:8080/jahia/sites/digitall/home/mypage.html" hreflang="en" rel="alternate"> </xhtml:link>
</url> 
<!-- nodePath: /sites/digitall/home/mypage--> 
<!-- nodeUrl: http://localhost:8080/jahia/sites/digitall/home/mypage.html--> 
<!-- type: jnt:page--> 
<!-- uuid: 2c388961-cad7-439c-8a5a-d808e8db7167--> 

This parameter shouldn't be left enabled in production.