6 - Cluster setup

6.1 Discovery

6.1.1 Overview

Discovery must be configured on every layer of the cluster architecture. In this document we will only detail the discovery configuration that is possible for the Marketing Factory elements, that is to say the Apache Unomi and ElasticSearch components. For DX cluster discovery setup, please refer to the DX "Configuration and Fine Tuning Guide" documentation.

Apache Unomi relies on Apache Karaf Cellar, which in turn uses Hazelcast to discover and configure its cluster. You just need to install multiple Apache Unomis on the same network, and then (optionally) change the Hazelcast configuration in the following file: <cxs-install-dir>etc/hazelcast.xml
All nodes on the same network, sharing the same cluster name will be part of the same cluster. You can find more information about how to configure Hazelcast here: http://docs.hazelcast.org/docs/3.4/manual/html/networkconfiguration.html

For the actual ElasticSearch configuration however, this must be done using the following file:
<elasticsearch-install-dir>/config/elasticsearch.yml
The documentation for the various discovery options and how they may be configured is available here: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery.html Depending on the cluster size, you will want to adjust the following parameters to make sure your setup is optimal in terms of performance and safety. Here is an example of a typical ElasticSearch cluster configuration:

script.engine.groovy.inline.update: on
# Protect against accidental close/delete operations
# on all indices. You can still close/delete individual
# indices
#action.disable_close_all_indices: true
#action.disable_delete_all_indices: true
#action.disable_shutdown: true
discovery.zen.ping.unicast.hosts: ["es1-unomi.apache.org","es2-unomi.apache.org","es3-unomi.apache.org", "es4-unomi.apache.org"]
network.host: es1-unomi.apache.org
transport.tcp.compress: true
cluster.name: contextElasticSearchExample
http.cors.enabled: true
http.cors.allow-origin: "*" (edited)

6.1.2 Multicast

Multicast makes it easier to setup cluster node in an "automatic" discovery, which can be very useful for setting up environments quickly such as test environments. However it is not recommended for production or even environments where multiple installs might co-exist in the same network, unless the installers have some solid experience with multicast network setups. Also it is only available on Apache Unomi, as ElasticSearch no longer supports multicast setups.

6.1.2.1 Apache Unomi

For multicast configuration in Apache Unomi, you will need to modify the <cxs-install-dir>/etc/hazelcast.xml file, more specifically the following section:

<join>
    <multicast enabled="true">
        <multicast-group>224.2.2.3</multicast-group>
        <multicast-port>54327</multicast-port>
    </multicast>

6.1.2.2 ElasticSearch

Multicast is not supported by ElasticSearch anymore, you must use Unicast or one of the other cloud discovery modules (Amazon EC2, Microsoft Azure, Google Cloud Compute Engine)

6.1.3 Unicast

6.1.3.1 Apache Unomi

The configuration for Unicast must be done in the <cxs-install-dir>/etc/hazelcast.xml file on all nodes:

<join>
    <multicast enabled="false">
        <multicast-group>224.2.2.3</multicast-group>
        <multicast-port>54327</multicast-port>
    </multicast>
    <tcp-ip enabled="true">
        <interface>127.0.0.1</interface>
    </tcp-ip>

6.1.3.2 ElasticSearch

In production, you must use unicast zen discovery. This works by providing ElasticSearch a list of nodes that it should try to contact. Once the node contacts a member of the unicast list, it will receive a full cluster state that lists all nodes in the cluster. It will then proceed to contact the master and join. The unicast protocol is activated by providing a list of hosts (IP + port number) to contact. The following changes have to be applied to the <elasticsearch-install-dir>/config/elasticsearch.yml file:

discovery.zen.ping.unicast.hosts=["es1-unomi.apache.org:9300", "es2-unomi.apache.org:9300"]

6.2 Unomi key and event security

As documented in section 3.2.5 of the Installation guide, in a cluster the event security IP address list must be updated to include all the IPs of all the DX servers that will be talking to it. For example you must modify the org.apache.unomi.thirdparty.cfg file to look something like this:

thirdparty.provider1.key=<generated-unomi-key>
thirdparty.provider1.ipAddresses=127.0.0.1,::1,127.0.0.2,127.0.0.3

6.3 Recommended configurations

In this section we provide examples of some Apache Unomi ElasticSearch settings for different cluster sizes. All of these must be configured in the following file:

org.apache.unomi.persistence.elasticsearch.cfg

This concerns mostly replication, which makes the setup more resistant to failures. For more information about ElasticSearch replication, please consult the following resource: https://www.elastic.co/guide/en/elasticsearch/guide/current/replica-shards.html

The recommended number of nodes for an ElasticSearch cluster is 3-nodes, which strikes the perfect balance between reliability, scalability and performance. It is possible to run smaller clusters (even with a single node), but those will require downtime should anything happen to a node. Also since replicas are only recommended above 2 nodes, no redundence will exist in the system and only backups (using ElasticSearch snapshots) will protect again any failure.

6.3.1 - 3 node cluster

This setup sets up one replica since we have enough nodes to be able to use replicas without affecting performance too much. For all three nodes the configuration should look like this:

numberOfReplicas=1
numberOfShards=5
monthlyIndex.numberOfReplicas=1
monthlyIndex.numberOfShards=5

6.4 Clustering on Amazon Web Service

One critical thing when setting up ElasticSearch clusters on EC2 is the value of network.host in <elasticsearch-install-dir>/config/elasticsearch.yml. The trick is to use the VPN/internal IP as network host (For instance, network.host: _eth0:ipv4_" - see https://www.elastic.co/guide/en/elasticsearch/reference/1.6/modules-network.html) to be sure to get the right IP if it’s dynamic and the public IP in the unicast discovery. The default value "192.168.0.1" doesn’t work on AWS.

6.5 Validating the cluster installation

6.5.1 ElasticSearch cluster validation

First, make sure you check all the logs of all the ElasticSearch cluster nodes and that there are no errors, especially errors or warnings about node to node communication. If you see any messages about trouble finding master (see troubleshooting section below), there is probably a problem with the installation.

You can then check the status of the ElasticSearch cluster by accessing the following URL:

http://ES_NODE_IP_ADDRESS:9200/_cat/health?v

You should see a green status. If not, check your installation because something is not set up correctly.

6.5.2 Apache Unomi cluster validation

First, check the logs in the data/logs directory and make sure there are no errors. If you find any errors, you should check your setup and fix any problems before going any further.

You can then perform the following request to test the status of the Apache Unomi cluster:

https://UNOMI_NODE_IP_ADDRESS:9443/cxs/cluster

If you get a warning about the site certificate that is normal because by default Apache Unomi ships with a self-signed certificate (which you should replace with your own once going in production). You will be prompted for a user name and password (by default karaf/karaf which you should also change for any permanent installation). If everything went well, you should get back a JSON structure indicating the active nodes in the cluster. Make sure that everything looks right before continuing the cluster install.

6.6 Cluster troubleshooting

In Apache Unomi, you might want to look at the cluster command lines available in Apache Karaf. Depending on how you launched Karaf, you may either use them directly on the console, or through an SSH connection to the server. To connect through SSH simply do:

ssh –p 8102 karaf@unomi

The default password is "karaf". You should really change this upon installation by modifying the <cxs-install-dir>/etc/users.properties file. To get a list of available commands, simply type on the command line:

help

In order to monitor the state of your Unomi ElasticSearch Cluster, 2 URLs are quite useful: - Error! Hyperlink reference not valid. : This command retrieves the health of your ElasticSearch cluster (green is good). If you face communication issues between your cluster nodes, it will be orange or red. Make sure that all the nodes of your cluster are started in order to get a better idea of your cluster health. - Error! Hyperlink reference not valid. : This command gives you a detailed state of your Unomi node in the cluster.

ATTENTION: When you start the first node of your ElasticSearch cluster, if the replica is set to more than one it is normal to get an error in the log. The first node is not able to replicate the data into other nodes and you get the following error message:

Caused by: org.elasticsearch.action.UnavailableShardsException: [context][1] Not enough active copies to meet write consistency of [QUORUM] (have 1, needed 2). Timeout: [1m], request: index

ATTENTION: The following WARN could also be displayed:

[2017-02-08T10:15:23,571][WARN ][o.e.d.z.ElectMasterService] [RlyE1OZ] value for setting "discovery.zen.minimum_master_nodes" is too low. This can result in data loss! Please set it to at least a quorum of master-eligible nodes (current value: [-1], total number of master-eligible nodes used for publishing in this round: [2])

This happens if the cluster could not find enough nodes that are master eligible. Please note that when this warning appears queries to ElasticSearch will not work properly and so this problem should be resolved before setting up Marketing Factory on top of Apache Unomi. Please refer directly to ElasticSearch documentation for more information: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-zen.html

6.7 Apache HTTP server setup

Here is an example of how to setup an Apache HTTP server to add as a load-balancer in front of 3 Apache Unomi nodes:

<IfModule mod_ssl.c>
    <VirtualHost *:443>
        ServerName unomi.acme.com
        ServerAdmin monitor@acme.com
        DocumentRoot /var/www/html
        CustomLog /var/log/apache2/access-unomi.acme.com.log combined
        ErrorLog /var/log/apache2/error-unomi.acme.com.log
        <Directory />
                Options FollowSymLinks
                AllowOverride None
        </Directory>
        <Directory /var/www/html>
                Options FollowSymLinks MultiViews
                AllowOverride None
                Order allow,deny
                allow from all
        </Directory>
        <Location /cxs>
                #localhost subnet, add your own IPs for your DX servers here or MF might not work properly.
                Require ip 127.0.0.1 10.100
        </Location>
        RewriteEngine On
        RewriteCond %{REQUEST_METHOD} ^(TRACE|TRACK)
        RewriteRule .* - [F]
        ProxyPreserveHost On
        ProxyPass /server-status !
        ProxyPass /robots.txt !
        RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
        RewriteCond %{HTTP_USER_AGENT} msnbot [OR]
        RewriteCond %{HTTP_USER_AGENT} Slurp
        RewriteRule ^.* - [F,L]

        ProxyPass / balancer://unomi_cluster/
        ProxyPassReverse / balancer://unomi_cluster/
        Header add Set-Cookie "ROUTEID=.%{BALANCER_WORKER_ROUTE}e; path=/" env=BALANCER_ROUTE_CHANGED
        <Proxy balancer://unomi_cluster>
                BalancerMember http://unomi-node01.int.acme.com:8181 route=1 connectiontimeout=20 timeout=300 ttl=120
                BalancerMember http://unomi-node02.int.acme.com:8181 route=2 connectiontimeout=20 timeout=300 ttl=120
                BalancerMember http://unomi-node03.int.acme.com:8181 route=3 connectiontimeout=20 timeout=300 ttl=120
                ProxySet lbmethod=bytraffic stickysession=ROUTEID
        </Proxy>

        RemoteIPHeader X-Forwarded-For

        Include ssl-common.conf

        BrowserMatch "MSIE [2-6]" \
        nokeepalive ssl-unclean-shutdown \
        downgrade-1.0 force-response-1.0
        BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown

        # HSTS (mod_headers is required) (15768000 seconds = 6 months)
        Header always set Strict-Transport-Security "max-age=15768000"
    </VirtualHost>
</IfModule>

As you can see the mod_proxy module is used to perform load balancing using cookies for sticky sessions.