Monitoring your servers

The Server Availability Manager (SAM) module supports monitoring in complex containerized environments. SAM extends Jahia's GraphQL API and provides server monitoring, server availability, and maintenance operation functionality. Learn more below about how to use and extend Jahia APIs with complex containerized environments.

Note: SAM is compatible with Jahia 7.3.3.0 and higher.

Migrating from the Healthcheck module

Server Availability Manager improves on and replaces the Healtheck module and adds more server availability functionality. As with Healthcheck, SAM offers the ability to develop new probes (it ships by default with 4).

When migrating from Healthcheck to SAM, remember that the data object has been removed from the health probes. These data elements (such as system load, node list, and URLs) are now provided by dedicated nodes of the API rather than the health probe. SAM actually includes nodes dedicated to system monitoring, but those nodes are located outside of the probes.

Monitoring system health

The SAM module provides insights into your platform's health and triggers alerts when key components need particular attention. Available through GraphQL or REST, you can trigger the module at will with minimal impact on the platform load. You can also develop additional probes to provide more information to the monitoring systems.

Probes are categorized by severity and report a status:

GREEN (Nominal status)
YELLOW (Non-critical issue)
RED (Critical issue)

The query below fetches the status of all of the probes with severity LOW or above:

query {
  admin {
    jahia {
      healthCheck(severity: LOW) {  # You minimum severity to return
        status          # Highest reported status across all probes
        probes {
          name          # Name of the probe
          status        # Status reported by the probe (GREEN to RED)
          severity      # Severity of the probe (LOW to CRITICAL)
          description   # Description specified by the developer of the probe
        }
      }
    }
  }
}

As mentioned in About monitoring, this module replaces the previous healthcheck module.

Available probes

Server Availability Manager ships by default with four probes that you can configure by editing the org.jahia.modules.sam.healthcheck.ProbesRegistry.cfg file.

DBConnectivity
Verifies that the database is configured with a valid connection. This probe has a default severity of CRITICAL and only returns GREEN or RED.
Datastore
Verifies the connection to the JCR Datastore and the ability to write to it. This probe has a default severity of CRITICAL and only returns GREEN or RED.
ServerLoad
Verifies both request and session loads over one minute. This probe has a default severity of HIGH and can return a GREEN, YELLOW, or RED status based on configurable thresholds. The configuration file above contains examples of such configuration.
ModuleState
Verifies if any of the modules are in an inactive state. This probe has a default severity of MEDIUM and can return GREEN or RED depending on module states. You can configure this module with both a whitelist (only monitor specific modules) and a blacklist (monitor all modules except a provided list). The configuration file above contains examples of such configurations.

About the REST API

The module also contains a servlet providing REST (GET) capabilities to Jahia to help you transition from the Healthcheck to the Server Availability Manager module. Accessible at https://[YOUR_JAHIA_HOST]/modules/healthcheck?severity=low, a GET request to this URL returns the list of probes and their values.

You can further customize the configuration by editing the org.jahia.modules.sam.healthcheck.HealthCheckServlet.cfg file.

# default severity level with "?severity=LEVEL" is not provided
severity.default=MEDIUM

# Threshold above which an HTTP error code will be returned
status.threshold=RED
# Error code to be returned if above threshold
status.code=503

Developing your own health probes

You can easily develop your own probes in a very similar fashion to the Healthcheck module, by taking inspiration from the existing probes available here on here on GitHub.

Monitoring background tasks

During its regular lifecycle, Jahia performs actions that shouldn't be interrupted by maintenance activities, such as server shutdown and database maintenance. By exposing such tasks, Jahia makes third-party platforms (or individuals) aware of when to avoid such interruptions.

The following query returns the list of critical tasks currently running on the server. The query returns the tasks running at the time the query was made. The Server Availability Manager does not keep a log of previously running tasks.

query {
  admin {
    jahia {
      tasks {
        service # Name of the service holding the task
        name    # Name of the tasks that should not be interrupted
        started # Datetime at which the task started (if available)
      }
    }
  }
}

Core

To provide backward compatibility with older versions of Jahia, the module implements two approaches for identifying running tasks:

For tasks triggered by modules (or Jahia versions) released prior to the availability of SAM, SAM uses pattern matching on the running threads to identify critical tasks. When listing tasks, those will appear grouped within the core service.
For future releases of Jahia and its modules, a registry is made available to allow such tasks to be registered when starting and unregistered once completed.

The following task are monitored under the core service:

IMPORT_ZIP
IMPORT_XML
BACKGROUND_JOB
BUNDLE_START
BUNDLE_INSTALL

Registering tasks

The module has a registry of running tasks allowing modules that are not part of the Jahia default distribution to declare their own tasks. This can also be extended to external services willing to prevent a server from being restarted.

Registering tasks from a Java module

To register long-running tasks, we use the Jahia FrameworkService and TaskRegisterEventHandler of Server Availability Manager.

To register tasks from a Java module:

Before registering the task, we recommended that you unregister it first in case it wasn't cleaned up due to failure with the following command:
FrameworkService.sendEvent(UNREGISTER_EVENT, constructTaskDetailsEvent(workspace, WORKSPACE_INDEXATION), true);
where UNREGISTER_EVENT represents event topic - org/jahia/modules/sam/TaskRegistryService/UNREGISTER, and constructTaskDetailsEvent creates a map with three event properties:
- name
- service
- started, in this example we register workspace indexation of Augmented Search, example can be seen in ESService.java class
Register the task when starting, following the same approach as unregister. Just use REGISTER instead of UNREGISTER.
Unregister the task when ended.

Important: Tasks are distinguished by a combination of name and service to ensure that the combination is unique.

Registering tasks through the GraphQL API

You can use the GraphQL API to create and delete tasks when external services need to inform a server that it should not be restarted. This example shows how to use createTask. Using deleteTask with the same parameters deletes that particular task.

mutation {
  admin {
    jahia {
      createTask(service: "DevOps Team" name: "Network maintenance on Core VPC")
    }
  }
}

The registry is shared between GraphQL and Java modules, so you can create a task in a Java module and delete it using the GraphQL API.

Performing other operations

Jahia also supports a set of additional operations associated with monitoring.

Server shutdown

Working jointly with the tasks registry, a shutdown service is exposed via the GraphQL API. This API node should be used with care as it shuts down the Jahia server.

This query is provided as an example. Don’t use timeout and force together because force triggers an immediate shutdown without considering the timeout value.

mutation {
  admin {
    jahia {
      shutdown(
        # When dryRun is provided, the server will not be shutdown but 
        # still return the expected API response (true or false) 
        dryRun: true,   
        timeout: 25,    # In seconds, maximum time to wait for server to be ready (empty list of tasks) to shutdown 
        force: true     # Force immediate shutdown even if tasks are running
      )
    }
  }
}

Server load metrics

Load metrics are also available through the GraphQL API, as per the query below:

query {
  admin {
    jahia {
      load {
        requests {
          count
          # Interval can be ONE, FIVE, or FIFTEEN minutes
          average(interval: ONE)
        }
        sessions {
          count
          average(interval: FIVE)
        }
      }
    }
  }
}