Monitoring your servers
The Server Availability Manager (SAM) module supports monitoring in complex containerized environments. SAM extends Jahia's GraphQL API and provides server monitoring, server availability, and maintenance operation functionality. Learn more below about how to use and extend Jahia APIs with complex containerized environments.
Migrating from the Healthcheck module
Server Availability Manager improves on and replaces the Healtheck module and adds more server availability functionality. As with Healthcheck, SAM offers the ability to develop new probes (it ships by default with 4).
When migrating from Healthcheck to SAM, remember that the data object has been removed from the health probes. These data elements (such as system load, node list, and URLs) are now provided by dedicated nodes of the API rather than the health probe. SAM actually includes nodes dedicated to system monitoring, but those nodes are located outside of the probes.
Monitoring system health
The SAM module provides insights into your platform's health and triggers alerts when key components need particular attention. Available through GraphQL or REST, you can trigger the module at will with minimal impact on the platform load. You can also develop additional probes to provide more information to the monitoring systems.
Probes are categorized by severity and report a status:
- GREEN (Nominal status)
- YELLOW (Non-critical issue)
- RED (Critical issue)
The query below fetches the status of all of the probes with severity LOW or above:
query {
admin {
jahia {
healthCheck(severity: LOW) { # You minimum severity to return
status # Highest reported status across all probes
probes {
name # Name of the probe
status # Status reported by the probe (GREEN to RED)
severity # Severity of the probe (LOW to CRITICAL)
description # Description specified by the developer of the probe
}
}
}
}
}
As mentioned in About monitoring, this module replaces the previous healthcheck module.
Available probes
Server Availability Manager ships by default with four probes that you can configure by editing the org.jahia.modules.sam.healthcheck.ProbesRegistry.cfg file.
- DBConnectivity
Verifies that the database is configured with a valid connection. This probe has a default severity of CRITICAL and only returns GREEN or RED. - Datastore
Verifies the connection to the JCR Datastore and the ability to write to it. This probe has a default severity of CRITICAL and only returns GREEN or RED. - ServerLoad
Verifies both request and session loads over one minute. This probe has a default severity of HIGH and can return a GREEN, YELLOW, or RED status based on configurable thresholds. The configuration file above contains examples of such configuration. - ModuleState
Verifies if any of the modules are in an inactive state. This probe has a default severity of MEDIUM and can return GREEN or RED depending on module states. You can configure this module with both a whitelist (only monitor specific modules) and a blacklist (monitor all modules except a provided list). The configuration file above contains examples of such configurations.
About the REST API
The module also contains a servlet providing REST (GET) capabilities to Jahia to help you transition from the Healthcheck to the Server Availability Manager module. Accessible at https://[YOUR_JAHIA_HOST]/modules/healthcheck?severity=low, a GET request to this URL returns the list of probes and their values.
You can further customize the configuration by editing the org.jahia.modules.sam.healthcheck.HealthCheckServlet.cfg file.
# default severity level with "?severity=LEVEL" is not provided
severity.default=MEDIUM
# Threshold above which an HTTP error code will be returned
status.threshold=RED
# Error code to be returned if above threshold
status.code=503
Developing your own health probes
You can easily develop your own probes in a very similar fashion to the Healthcheck module, by taking inspiration from the existing probes available here on here on GitHub.
Monitoring background tasks
During its regular lifecycle, Jahia performs actions that shouldn't be interrupted by maintenance activities, such as server shutdown and database maintenance. By exposing such tasks, Jahia makes third-party platforms (or individuals) aware of when to avoid such interruptions.
The following query returns the list of critical tasks currently running on the server. The query returns the tasks running at the time the query was made. The Server Availability Manager does not keep a log of previously running tasks.
query {
admin {
jahia {
tasks {
service # Name of the service holding the task
name # Name of the tasks that should not be interrupted
started # Datetime at which the task started (if available)
}
}
}
}
Core
To provide backward compatibility with older versions of Jahia, the module implements two approaches for identifying running tasks:
- For tasks triggered by modules (or Jahia versions) released prior to the availability of SAM, SAM uses pattern matching on the running threads to identify critical tasks. When listing tasks, those will appear grouped within the core service.
- For future releases of Jahia and its modules, a registry is made available to allow such tasks to be registered when starting and unregistered once completed.
The following task are monitored under the core service:
- IMPORT_ZIP
- IMPORT_XML
- BACKGROUND_JOB
- BUNDLE_START
- BUNDLE_INSTALL
Registering tasks
The module has a registry of running tasks allowing modules that are not part of the Jahia default distribution to declare their own tasks. This can also be extended to external services willing to prevent a server from being restarted.
Registering tasks from a Java module
To register long-running tasks, we use the Jahia FrameworkService
and TaskRegisterEventHandler
of Server Availability Manager.
To register tasks from a Java module:
- Before registering the task, we recommended that you unregister it first in case it wasn't cleaned up due to failure with the following command:
FrameworkService.sendEvent(
UNREGISTER_EVENT
, constructTaskDetailsEvent(workspace, WORKSPACE_INDEXATION), true);
whereUNREGISTER_EVENT
represents event topic -org/jahia/modules/sam/TaskRegistryService/UNREGISTER
, andconstructTaskDetailsEvent
creates a map with three event properties:- name
- service
- started, in this example we register workspace indexation of Augmented Search, example can be seen in
ESService.java
class
- Register the task when starting, following the same approach as unregister. Just use
REGISTER
instead ofUNREGISTER.
- Unregister the task when ended.
Registering tasks through the GraphQL API
You can use the GraphQL API to create and delete tasks when external services need to inform a server that it should not be restarted. This example shows how to use createTask
. Using deleteTask
with the same parameters deletes that particular task.
mutation {
admin {
jahia {
createTask(service: "DevOps Team" name: "Network maintenance on Core VPC")
}
}
}
The registry is shared between GraphQL and Java modules, so you can create a task in a Java module and delete it using the GraphQL API.
Performing other operations
Jahia also supports a set of additional operations associated with monitoring.
Server shutdown
Working jointly with the tasks registry, a shutdown service is exposed via the GraphQL API. This API node should be used with care as it shuts down the Jahia server.
This query is provided as an example. Don’t use timeout
and force
together because force
triggers an immediate shutdown without considering the timeout
value.
mutation {
admin {
jahia {
shutdown(
# When dryRun is provided, the server will not be shutdown but
# still return the expected API response (true or false)
dryRun: true,
timeout: 25, # In seconds, maximum time to wait for server to be ready (empty list of tasks) to shutdown
force: true # Force immediate shutdown even if tasks are running
)
}
}
}
Server load metrics
Load metrics are also available through the GraphQL API, as per the query below:
query {
admin {
jahia {
load {
requests {
count
# Interval can be ONE, FIVE, or FIFTEEN minutes
average(interval: ONE)
}
sessions {
count
average(interval: FIVE)
}
}
}
}
}