6 - Cluster setup

  Written by The Jahia Team
   Estimated reading time:

6.1 Discovery

6.1.1 Overview

Discovery must be configured on every layer of the cluster architecture. In this document we will only detail the discovery configuration that is possible for the Marketing Factory elements, that is to say the Apache Unomi and ElasticSearch components. For DX cluster discovery setup, please refer to the DX "Configuration and Fine Tuning Guide" documentation. Apache Unomi relies on Apache Karaf Cellar, which in turn uses Hazelcast to discover and configure its cluster. You just need to install multiple Apache Unomis on the same network, and then (optionally) change the Hazelcast configuration in the following file: <cxs-install-dir>etc/hazelcast.xml
All nodes on the same network, sharing the same cluster name will be part of the same cluster. You can find more information about how to configure Hazelcast here: http://docs.hazelcast.org/docs/3.4/manual/html/networkconfiguration.html For the actual ElasticSearch configuration however, this must be done using the following file:
<elasticsearch-install-dir>/config/elasticsearch.yml
The documentation for the various discovery options and how they may be configured is available here: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery.html Depending on the cluster size, you will want to adjust the following parameters to make sure your setup is optimal in terms of performance and safety. Here is an example of a typical ElasticSearch cluster configuration:

script.engine.groovy.inline.update: on
# Protect against accidental close/delete operations
# on all indices. You can still close/delete individual
# indices
#action.disable_close_all_indices: true
#action.disable_delete_all_indices: true
#action.disable_shutdown: true
discovery.zen.ping.unicast.hosts: ["es1-unomi.apache.org","es2-unomi.apache.org","es3-unomi.apache.org", "es4-unomi.apache.org"]
network.host: es1-unomi.apache.org
transport.tcp.compress: true
cluster.name: contextElasticSearchExample
http.cors.enabled: true
http.cors.allow-origin: "*" (edited)
6.1.2 Multicast

Multicast makes it easier to setup cluster node in an "automatic" discovery, which can be very useful for setting up environments quickly such as test environments. However it is not recommended for production or even environments where multiple installs might co-exist in the same network, unless the installers have some solid experience with multicast network setups. Also it is only available on Apache Unomi, as ElasticSearch no longer supports multicast setups.

6.1.2.1 Apache Unomi

For multicast configuration in Apache Unomi, you will need to modify the <cxs-install-dir>/etc/hazelcast.xml file, more specifically the following section:

<join>
    <multicast enabled="true">
        <multicast-group>224.2.2.3</multicast-group>
        <multicast-port>54327</multicast-port>
    </multicast>
6.1.2.2 ElasticSearch

Multicast is not supported by ElasticSearch anymore, you must use Unicast or one of the other cloud discovery modules (Amazon EC2, Microsoft Azure, Google Cloud Compute Engine)

6.1.3 Unicast
6.1.3.1 Apache Unomi

The configuration for Unicast must be done in the <cxs-install-dir>/etc/hazelcast.xml file on all nodes:

<join>
    <multicast enabled="false">
        <multicast-group>224.2.2.3</multicast-group>
        <multicast-port>54327</multicast-port>
    </multicast>
    <tcp-ip enabled="true">
        <interface>127.0.0.1</interface>
    </tcp-ip>
6.1.3.2 ElasticSearch

In production, it is recommended to use unicast instead of multicast (see https://www.elastic.co/guide/en/elasticsearch/guide/current/_important_configuration_changes.html#_prefer_unicast_over_multicast). This works by providing ElasticSearch a list of nodes that it should try to contact. Once the node contacts a member of the unicast list, it will receive a full cluster state that lists all nodes in the cluster. It will then proceed to contact the master and join. The unicast protocol is activated by disabling the multicast and providing a list of hosts (IP + port number) to contact. The following changes have to be applied to the <elasticsearch-install-dir>/config/elasticsearch.yml file:

discovery.zen.ping.unicast.hosts=["es1-unomi.apache.org:9300", "es2-unomi.apache.org:9300"]

6.2 Recommended configurations

In this section we provide examples of some Apache Unomi ElasticSearch settings for different cluster sizes. All of these must be configured in the following file:

org.apache.unomi.persistence.elasticsearch.cfg

This concerns mostly replication, which makes the setup more resistant to failures. For more information about ElasticSearch replication, please consult the following resource: https://www.elastic.co/guide/en/elasticsearch/guide/current/replica-shards.html

6.2.1 - 2 node cluster

In this setup we don’t have enough ElasticSearch nodes to properly benefit from replicas (notably because of their impact on performance) so we deactivate them.

Node A:

numberOfReplicas=0
monthlyIndex.numberOfReplicas=0

Node B:

numberOfReplicas=0
monthlyIndex.numberOfReplicas=0
6.2.2 - 3 node cluster

This setup sets up one replica since we have enough nodes to be able to use replicas without affecting performance too much. Node A:

numberOfReplicas=1
monthlyIndex.numberOfReplicas=1

Node B:

numberOfReplicas=1
monthlyIndex.numberOfReplicas=1

Node C:

numberOfReplicas=1
monthlyIndex.numberOfReplicas=1

6.3 Cluster troubleshooting

In Apache Unomi, you might want to look at the cluster command lines available in Apache Karaf. Depending on how you launched Karaf, you may either use them directly on the console, or through an SSH connection to the server. To connect through SSH simply do:

ssh –p 8102 karaf@unomi

The default password is "karaf". You should really change this upon installation by modifying the <cxs-install-dir>/etc/users.properties file. To get a list of available commands, simply type on the command line:

help

In order to monitor the state of your Unomi ElasticSearch Cluster, 2 URLs are quite useful: - Error! Hyperlink reference not valid. : This command retrieves the health of your ElasticSearch cluster (green is good). If you face communication issues between your cluster nodes, it will be orange or red. Make sure that all the nodes of your cluster are started in order to get a better idea of your cluster health. - Error! Hyperlink reference not valid. : This command gives you a detailed state of your Unomi node in the cluster.

6.4 Clustering on Amazon Web Service

One critical thing when setting up ElasticSearch clusters on EC2 is the value of network.host in <elasticsearch-install-dir>/config/elasticsearch.yml. The trick is to use the VPN/internal IP as network host (For instance, network.host: _eth0:ipv4_" - see https://www.elastic.co/guide/en/elasticsearch/reference/1.6/modules-network.html) to be sure to get the right IP if it’s dynamic and the public IP in the unicast discovery. The default value "192.168.0.1" doesn’t work on AWS.

6.5 Apache HTTP server setup

Here is an example of how to setup an Apache HTTP server to add as a load-balancer in front of 3 Apache Unomi nodes:

<IfModule mod_ssl.c>
    <VirtualHost *:443>
        ServerName unomi.acme.com
        ServerAdmin monitor@acme.com
        DocumentRoot /var/www/html
        CustomLog /var/log/apache2/access-unomi.acme.com.log combined
        ErrorLog /var/log/apache2/error-unomi.acme.com.log
        <Directory />
                Options FollowSymLinks
                AllowOverride None
        </Directory>
        <Directory /var/www/html>
                Options FollowSymLinks MultiViews
                AllowOverride None
                Order allow,deny
                allow from all
        </Directory>
        <Location /cxs>
                #localhost subnet, add your own IPs for your DX servers here or MF might not work properly.
                Require ip 127.0.0.1 10.100
        </Location>
        RewriteEngine On
        RewriteCond %{REQUEST_METHOD} ^(TRACE|TRACK)
        RewriteRule .* - [F]
        ProxyPreserveHost On
        ProxyPass /server-status !
        ProxyPass /robots.txt !
        RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
        RewriteCond %{HTTP_USER_AGENT} msnbot [OR]
        RewriteCond %{HTTP_USER_AGENT} Slurp
        RewriteRule ^.* - [F,L]

        ProxyPass / balancer://unomi_cluster/
        ProxyPassReverse / balancer://unomi_cluster/
        Header add Set-Cookie "ROUTEID=.%{BALANCER_WORKER_ROUTE}e; path=/" env=BALANCER_ROUTE_CHANGED
        <Proxy balancer://unomi_cluster>
                BalancerMember http://unomi-node01.int.acme.com:8181 route=1 connectiontimeout=20 timeout=300 ttl=120
                BalancerMember http://unomi-node02.int.acme.com:8181 route=2 connectiontimeout=20 timeout=300 ttl=120
                BalancerMember http://unomi-node03.int.acme.com:8181 route=3 connectiontimeout=20 timeout=300 ttl=120
                ProxySet lbmethod=bytraffic stickysession=ROUTEID
        </Proxy>

        RemoteIPHeader X-Forwarded-For

        Include ssl-common.conf

        BrowserMatch "MSIE [2-6]" \
        nokeepalive ssl-unclean-shutdown \
        downgrade-1.0 force-response-1.0
        BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown

        # HSTS (mod_headers is required) (15768000 seconds = 6 months)
        Header always set Strict-Transport-Security "max-age=15768000"
    </VirtualHost>
</IfModule>

As you can see the mod_proxy module is used to perform load balancing using cookies for sticky sessions.