Monitoring your servers
The Server Availability Manager (SAM) module supports monitoring in complex containerized environments. SAM extends Jahia's GraphQL API and provides server monitoring, server availability, and maintenance operation functionality. Learn more below about how to use and extend Jahia APIs with complex containerized environments.
Migrating from the Healthcheck module
Server Availability Manager improves on and replaces the Healtheck module and adds more server availability functionality. As with Healthcheck, SAM offers the ability to develop new probes (it ships by default with 5).
When migrating from Healthcheck to SAM, remember that the data object has been removed from the health probes. These data elements (such as system load, node list, and URLs) are now provided by dedicated nodes of the API rather than the health probe. SAM actually includes nodes dedicated to system monitoring, but those nodes are located outside of the probes.
Monitoring system health
The SAM module provides insights into your platform's health and triggers alerts when key components need particular attention. Available through GraphQL or REST, you can trigger the module at will with minimal impact on the platform load. You can also develop additional probes to provide more information to the monitoring systems.
Probes are categorized by severity and report a status:
- GREEN (Nominal status)
- YELLOW (Non-critical issue)
- RED (Critical issue)
The query below fetches the status of all of the probes with severity LOW or above:
query {
admin {
jahia {
healthCheck(severity: LOW) { # You minimum severity to return
status { # Highest reported status across all probes
health # GREEN, YELLOW or RED
}
probes {
name # Name of the probe
status { # Status reported by the probe
health # GREEN, YELLOW or RED
message # Explanation for the health level
}
severity # Severity of the probe (LOW to CRITICAL)
description # Description specified by the developer of the probe
}
}
}
}
}
The healthCheck
GraphQL query also takes an optional parameter includes
that will filter the probes returned and reported status by specifying one or more probe names. The probes
node supports argument health
, which will filter the probes by their status (e.g. health=YELLOW
will return all probes that are YELLOW and RED).
As mentioned in About monitoring, this module replaces the previous healthcheck module.
Permissions
By default, only the root user is able to perform GraphQL Queries against Server Availability Manager. Although not directly associated with Server Availability Manager, a dedicated permission called "Graphql admin query" is available for querying GraphQL nodes under "query.admin".
Using this permission makes it possible to create a dedicated monitoring user (or role) requiring less privileges than the root user.
This permission is visible using "Roles and Permissions" > "Other Permissions" > "Admin".
Available probes
Server Availability Manager ships by default with probes that you can configure by editing the org.jahia.modules.sam.healthcheck.ProbesRegistry.cfg file.
- DBConnectivity
Verifies that the database is configured with a valid connection. This probe has a default severity of CRITICAL and only returns GREEN or RED.
The following messages are available:
1 Connection established (everything is OK)
2 Could not connect (connection was not successfull)
3 Encountered exception while connecting (the server produced SQLException) - Datastore
Verifies the connection to the JCR Datastore and the ability to write to it. This probe has a default severity of CRITICAL and only returns GREEN or RED.
The following messages are available:
1 Datastore is healthy (everything is OK)
2 Could not perform write operation (could not write to jahia.jackrabbit.datastore.path) - ClusterConsistency
Verifies that module states are consistent across given cluster group, reports modules that have inconsistent state. It also checks cluster node count against local revision and reports on cluster sync status. This probe has default severity of HIGH and returns GREEN, YELLOW and RED.
By default the check is performed against default cluster group you can change the name via configuration by assigning value to clusterGroup property of the probe. The cellar section is added if the probe detects differences between the local and the cellar (from default group) states.
The nodeNumberCheck and revisionCheck sections are added if there is an inconsistency with the node count or the cluster is out of sync respectively.
The following messages are available:
1 Cluster is not activated (everything is OK)
2 Failed to find ClusteredBundleService (this indicates probe cannot function properly due to internal issues, status for this is RED)
3 No issues found on this node (everything is OK, all cluster nodes should have this message if cluster is consistent)
4 Message giving, for each cluster node, the module(s) which caused the issue (this indicates there is an issues and the status is YELLOW):{ "dx-cluster-node1-id":[ { "clusterNodeId":"dx-cluster-node1-id", "module":"org.jahia.modules/database-connector/1.5.0", "osgiState":"RESOLVED", "moduleState":"RESOLVED" } ], "dx-cluster-node2-id":[ { "clusterNodeId":"dx-cluster-node2-id", "module":"org.jahia.modules/database-connector/1.5.0", "osgiState":"STARTING", "moduleState":"RESOLVED" } ], "dx-cluster-node3-id":[ { "clusterNodeId":"dx-cluster-node3-id", "module":"org.jahia.modules/database-connector/1.5.0", "osgiState":"ACTIVE", "moduleState":"ACTIVE" } ], "cellar":[ { "module":"org.jahia.modules/database-connector/1.5.0", "clusterState":"ACTIVE" } ], "nodeNumberCheck":[ { "ehcacheNodeNumber":"The number of nodes in Ehcache is 4 while the number of local revisions is 3" }, { "clusterManagerNodeNumber":"The number of nodes in ClusterManager is 4 while the number of local revisions is 3" } ], "revisionCheck":{ "syncStatus":"The cluster is out of sync", "globalRevision":1075, "localRevision":1073 } }
- ServerLoad
Verifies both request and session loads over one minute. This probe has a default severity of HIGH and can return a GREEN, YELLOW, or RED status based on configurable thresholds. The configuration file above contains examples of such configuration.
The following messages are available (Note that request and session thresholds are set at 40 for YELLOW and 70 for RED):
1 Serverload is normal (indicates normal load)
2 Serverload is above normal (indicates above normal load)
3 Serverload is high (indicates hight load) - ModuleState
Verifies if any of the modules are in an inactive state. This probe has a default severity of MEDIUM and can return GREEN, YELLOW or RED depending on module states. You can configure this module with both a whitelist (only monitor specific modules) and a blacklist (monitor all modules except a provided list). The configuration file above contains examples of such configurations.
The following messages are available:
1 All modules are started (no not started modules in the system)
2 At least one module is not started. Module <module name> is i n <state description> state (this can indicate both YELLOW and RED health depending on if the module has another version in started state or not)
3 At least one modules has an invalid start-level. Module <module name> has start-level <start level> (this is a RED indicator)
You can correct the state of a module by either navigating to https://[YOUR_JAHIA_HOST]/jahia/administration/manageModules or https://[YOUR_JAHIA_HOST]/tools/osgi/console/bundles, locating desired module and choosing between either stoping, starting ot undeploying it.
To correct start level of a bundle you can either login to Karaf console or use latest provisioning API to send a Karaf command. For example, you can check and correct start level by executing a call to https://[YOUR_JAHIA_HOST]/modules/api/provisioning with required level of authorization and the following JSON body:
[ { "karafCommand": "bundle:start-level article 90" } ]
In the example above module article will have start-level of 90. If you want to simply check start-level execute the same command but without start-level at the end (i. e. 90). You can also send bundle:start-level --help for information. Note that results will be available in Tomcat console.
- PatchFailures
Verifies that migration was successfull. The probe has CRITICAL severity and can be in GREEN or RED state.
The following messages are available:
1 No patching to report on (didn't find patches folder, everything is OK)
2 Patch applied successfully (there are not failed patches in the patches folder, everything is OK)
3 Following patches failed: <patch name1>, <patch name2> ... (some patches failed, migration was not successfull) - ModulesDefinitions
Verifies that the modules are compatible with the current deployed definitions (more details about Module Definition checks here). The probe has HIGH severity and can be in GREEN or RED state.
The following messages are available:
1 The definitions used by the started <module name1>, <module name2>, ... modules correspond to the definitions of higher, non started, versions of these modules (some modules have their definitions modified, there are compatibility issues)
2 All modules are OK (there are no invalid modules, everything is OK) - SearchIndex
Verifies if search indices are too fragmented (default values: 5ms for the yellow threshold and 50ms for the red one). The probe has HIGH severity and can be in GREEN, YELLOW or RED state.
The following messages are available:
1 Query AVG (... ms) is lower than ... ms over the last minute. All good here. (average query execution time in the last minute is under the yellow threshold, everything is OK)
2 Query AVG (... ms) is greater than ... ms over the last minute. (average query execution time over the last minute is above the yellow threshold, this needs to be monitored)
3 Query AVG (... ms) is greater than ... ms over the last minute. It might be time to reindex. (average query execution time over the last minute is above the red threshold, redindexation is needed) - JExperienceConnections
Check the status of the connection between jExperience and jCustomer. This probe is only available when jExperience 2.6.0+ is installed on the platform. The probe has HIGH severity and can be in GREEN, YELLOW or RED state.
The following messages are available:
1 <Number> JExperience connection(s) in error out of <Number>: related to the following config key <Config Key> (YELLOW)
2 All JExperience connection(s) are OK (GREEN)
3 All JExperience connection(s) are KO (RED) - Snapshot
Checks if any of the modules on a Jahia instance are versioned as snapshot. This probe is only available on a production environment. The probe has MEDIUM severity and can be in GREEN or YELLOW state.
The following messages are available :
1 There are no snapshots (GREEN)
2 {\"snapshots\":[{\"module\":\"module_name1\",\"state\":\"Started\",\"version\":\"x.x.x-SNAPSHOT\"},{\"module\":\"module_name2\",\"state\":\"Started\",\"version\":\"x.x.x-SNAPSHOT\"}],\"snapshotCount\":2} (YELLOW) - RenderChain
Checks if Jahia render chains is working correctly. The probe has CRITICAL severity and can be in GREEN or RED state.
The following messages are available :
1 Rendering Chain works properly {rendering result} (GREEN)
2 RenderService/JahiaSitesService not found (RED)
3 Rendering Chain test result should have contained x but was y (RED)
4 Rendering Chain test returns and error {error message} (RED) - JahiaErrors
Count the number of errors faced by Jahia. The probe has DEBUG severity and can be in GREEN or YELLOW state. This probe is not recommended for use in production, it is useful in a CI/CD context during initial startup and provisioning phases.
The following messages are available :
1 No errors are present on the platform (GREEN)
2 A total of {error number} errors are present on the platform, errors are not expected in a production environment and we recommend reviewing these (YELLOW)
3 Jahia errors cannot be checked (YELLOW) - ModuleSpringUsage
Detects if some modules are using Spring. The probe has MEDIUM severity and can be in GREEN or YELLOW state. - MultipleBundleVersions
Check if multiple versions of the same bundle are present on a Jahia instance. The probe has HIGH severity and can be in GREEN or RED state. If multiple bundles are detected, they are listed in the probe message.
About the REST API
The module also contains a servlet providing REST (GET) capabilities to Jahia to help you transition from the Healthcheck to the Server Availability Manager module. Accessible at https://[YOUR_JAHIA_HOST]/modules/healthcheck?severity=low, a GET request to this URL returns the list of probes and their values.
You can further customize the configuration by editing the org.jahia.modules.sam.healthcheck.HealthCheckServlet.cfg file.
# default severity level with "?severity=LEVEL" is not provided
severity.default=MEDIUM
# default includes filter (one or more probe names separated by comma) to filter subset of available probes
includes.default=
# Threshold above which an HTTP error code will be returned
status.threshold=RED
# Error code to be returned if above threshold
status.code=503
Developing your own health probes
You can easily develop your own probes in a very similar fashion to the Healthcheck module, by taking inspiration from the existing probes available here on here on GitHub.
When developing your own probe, the module containing the probe will have a dependency on SAM. This dependency can be optional or not.
If you want to add a non-optional dependency to SAM you can add server-availability-manager to the jahia-depends property in the pom.xml of your module.
<jahia-depends>server-availability-manager</jahia-depends>
As your module will have a dependency on the Server Availability Manager module, it will be reloaded each time there is a modification in Server Availability Manager making your probe all time available.
With such a dependency, your module cannot be deployed without SAM installed.
Otherwise, if you want to add an optional dependency to the Server Availability Manager module in order to be able to deploy your module even if SAM is not on your environment, you will have two steps to follow.
Firstly, you will have to declare the dependency in your pom as optional like the following:
<dependency>
<groupId>org.jahia.modules</groupId>
<artifactId>server-availability-manager</artifactId>
<version>3.1.0</version>
<scope>provided</scope>
<optional>true</optional>
</dependency>
Then, you will have to add a bundle listener to activate your probe when SAM is present on the platform.
Example of the bundle listener:
// Your import package
...
@Component(immediate = true)
public class SAMBundleListener implements SynchronousBundleListener {
...
@Activate
public void activate(BundleContext context, ComponentContext componentContext) {
...
context.addBundleListener(this);
// If SAM is available
if (isServerAvailabilityManagerActive()) {
// Enalbe my probe
enableMyProbe();
}
}
@Deactivate
public void deactivate() {
context.removeBundleListener(this);
} // Remove the bundle listener
@Override
public void bundleChanged(BundleEvent event) {
Bundle bundle = event.getBundle();
if (SERVER_AVAILABILITY_MANAGER.equals(bundle.getSymbolicName())) { // if SAM is detected
if (event.getType() == BundleEvent.STARTED || event.getType() == BundleEvent.UPDATED) { // And it's started or updated
Bundle currentBundle = BundleUtils.getBundleBySymbolicName("my-module", null);
FrameworkWiring frameworkWiring = context.getBundle(0).adapt(FrameworkWiring.class);
if (frameworkWiring != null) {
// Refresh the current bundle
frameworkWiring.refreshBundles(Collections.singleton(currentBundle));
} else {
logger.info("FrameworkWiring service not available. Cannot refresh my module.");
}
} else if (event.getType() == BundleEvent.STOPPED) {
// Disable my probe if SAM is stopped
componentContext.disableComponent(MyProbe.class.getName());
}
}
}
private void enableMyProbe() {
componentContext.enableComponent(MyProbe.class.getName());
}
private boolean isServerAvailabilityManagerActive() {
return Arrays.stream(context.getBundles())
.anyMatch(bundle -> SERVER_AVAILABILITY_MANAGER.equals(bundle.getSymbolicName()) && bundle.getState() == Bundle.ACTIVE);
}
}
Monitoring background tasks
During its regular lifecycle, Jahia performs actions that shouldn't be interrupted by maintenance activities, such as server shutdown and database maintenance. By exposing such tasks, Jahia makes third-party platforms (or individuals) aware of when to avoid such interruptions.
The following query returns the list of critical tasks currently running on the server. The query returns the tasks running at the time the query was made. The Server Availability Manager does not keep a log of previously running tasks.
query {
admin {
jahia {
tasks {
service # Name of the service holding the task
name # Name of the tasks that should not be interrupted
started # Datetime at which the task started (if available)
}
}
}
}
Core
To provide backward compatibility with older versions of Jahia, the module implements two approaches for identifying running tasks:
- For tasks triggered by modules (or Jahia versions) released prior to the availability of SAM, SAM uses pattern matching on the running threads to identify critical tasks. When listing tasks, those will appear grouped within the core service.
- For future releases of Jahia and its modules, a registry is made available to allow such tasks to be registered when starting and unregistered once completed.
The following task are monitored under the core service:
- IMPORT_ZIP
- IMPORT_XML
- BACKGROUND_JOB
- BUNDLE_START
- BUNDLE_INSTALL
Registering tasks
The module has a registry of running tasks allowing modules that are not part of the Jahia default distribution to declare their own tasks. This can also be extended to external services willing to prevent a server from being restarted.
Registering tasks from a Java module
To register long-running tasks, we use the Jahia FrameworkService
and TaskRegisterEventHandler
of Server Availability Manager.
To register tasks from a Java module:
- Before registering the task, we recommended that you unregister it first in case it wasn't cleaned up due to failure with the following command:
FrameworkService.sendEvent(
UNREGISTER_EVENT
, constructTaskDetailsEvent(workspace, WORKSPACE_INDEXATION), true);
whereUNREGISTER_EVENT
represents event topic -org/jahia/modules/sam/TaskRegistryService/UNREGISTER
, andconstructTaskDetailsEvent
creates a map with three event properties:- name
- service
- started, in this example we register workspace indexation of Augmented Search, example can be seen in
ESService.java
class
- Register the task when starting, following the same approach as unregister. Just use
REGISTER
instead ofUNREGISTER.
- Unregister the task when ended.
Registering tasks through the GraphQL API
You can use the GraphQL API to create and delete tasks when external services need to inform a server that it should not be restarted. This example shows how to use createTask
. Using deleteTask
with the same parameters deletes that particular task.
mutation {
admin {
jahia {
createTask(service: "DevOps Team" name: "Network maintenance on Core VPC")
}
}
}
The registry is shared between GraphQL and Java modules, so you can create a task in a Java module and delete it using the GraphQL API.
Performing other operations
Jahia also supports a set of additional operations associated with monitoring.
Server shutdown
Working jointly with the tasks registry, a shutdown service is exposed via the GraphQL API. This API node should be used with care as it shuts down the Jahia server.
This query is provided as an example. Don’t use timeout
and force
together because force
triggers an immediate shutdown without considering the timeout
value.
mutation {
admin {
jahia {
shutdown(
# When dryRun is provided, the server will not be shutdown but
# still return the expected API response (true or false)
dryRun: true,
timeout: 25, # In seconds, maximum time to wait for server to be ready (empty list of tasks) to shutdown
force: true # Force immediate shutdown even if tasks are running
)
}
}
}
Server load metrics
Load metrics are also available through the GraphQL API, as per the query below:
query {
admin {
jahia {
load {
requests {
count
# Interval can be ONE, FIVE, or FIFTEEN minutes
average(interval: ONE)
}
sessions {
count
average(interval: FIVE)
}
}
}
}
}