- 1 Installation as Add-on
- 2 Challenges for Geo Clusters
- 3 Conceptual Overview
- 4 Requirements
- 5 Example Scenario and Basic Steps—Overview
- 6 Setting Up the Booth Services
- 7 Setting Up DRBD
- 8 Synchronizing Configuration Files Across All Sites and Arbitrators
- 9 Configuring Cluster Resources and Constraints
- 10 Setting Up IP Relocation via DNS Update
- 11 Managing Geo Clusters
- 12 Troubleshooting
- 13 Upgrading to the Latest Product Version
- A GNU Licenses
Apart from local clusters and metro area clusters, SUSE® Linux Enterprise High Availability Extension
12 also supports Geo clusters. That means you can have
multiple, geographically dispersed sites with a local cluster each. Failover
between these clusters is coordinated by a higher level entity: the booth
daemon (boothd). Support for Geo clusters is available as
a separate extension to High Availability Extension, called Geo Clustering for SUSE Linux Enterprise High Availability Extension.
1 Installation as Add-on #
For using the High Availability Extension and Geo Clustering for SUSE Linux Enterprise High Availability Extension, you need the packages included in the following installation patterns:
High AvailabilityGeo Clustering for High Availability
Note: Package Requirements for Arbitrators
If your Geo cluster setup includes one or more arbitrators (see
Arbitrator), those only need the
pattern Geo Clustering for High Availability. For instructions on
how to install this pattern, see
Section 1.2, “Installing the Packages on Arbitrators”.
Both patterns are only available if you have registered your system at SUSE Customer Center (or a local registration server) and have added the respective product channels or installation media as add-ons. For information on how to install add-on products, see the SUSE Linux Enterprise 12 Deployment Guide, available at http://www.suse.com/documentation/. Refer to chapter Installing Add-On Products.
1.1 Installing the Packages on Cluster Nodes #
In case both High Availability Extension and Geo Clustering for SUSE Linux Enterprise High Availability Extension have been added as add-on products, but the packages are not installed yet, proceed as follows:
To install the packages from both patterns via command line, use Zypper:
sudo
zypperin -t pattern ha_sles ha_geoAlternatively, use YaST for a graphical installation:
Start YaST as
rootuser and select › .Click › and activate the following patterns:
High AvailabilityGeo Clustering for High Availability
Click to start installing the packages.
Important: Installing Software Packages on all Parties
The software packages needed for High Availability and Geo clusters are not automatically copied to the cluster nodes.
Install SUSE Linux Enterprise Server 12 and the
High AvailabilityandGeo Clustering for High Availabilitypatterns on all machines that will be part of your Geo cluster.If you do not want to install the packages manually on all nodes that will be part of your cluster, use AutoYaST to clone existing nodes. Find more information in the Administration Guide for SUSE Linux Enterprise High Availability Extension 12, available from http://www.suse.com/documentation/. Refer to chapter Installation and Basic Setup, section Mass Deployment with AutoYaST.
For all machines that need the Geo Clustering for SUSE Linux Enterprise High Availability Extension add-on, you currently need to install the packages for Geo clusters manually. AutoYaST support for Geo Clustering for SUSE Linux Enterprise High Availability Extension is not yet available.
1.2 Installing the Packages on Arbitrators #
Make sure that Geo Clustering for SUSE Linux Enterprise High Availability Extension has been added as an add-on product to the machines to serve as arbitrators.
Log in to each arbitrator and install the packages with the following command:
sudo
zypperin -t pattern ha_geoAlternatively, use YaST to install the
Geo Clustering for High Availabilitypattern.
2 Challenges for Geo Clusters #
Typically, Geo environments are too far apart to support synchronous communication between the sites. That leads to the following challenges:
How to make sure that a cluster site is up and running?
How to make sure that resources are only started once?
How to make sure that quorum can be reached between the different sites and a split-brain scenario can be avoided?
How to keep the CIB up to date on all nodes and sites?
How to manage failover between the sites?
How to deal with high latency in case of resources that need to be stopped?
In the following sections, learn how to meet these challenges with SUSE Linux Enterprise High Availability Extension.
3 Conceptual Overview #
Geo clusters based on SUSE Linux Enterprise High Availability Extension can be considered “overlay” clusters where each cluster site corresponds to a cluster node in a traditional cluster. The overlay cluster is managed by the booth mechanism. It guarantees that the cluster resources will be highly available across different cluster sites. This is achieved by using cluster objects called tickets that are treated as failover domain between cluster sites, in case a site should be down. Booth guarantees that every ticket is owned by only one site at a time.
The following list explains the individual components and mechanisms that were introduced for Geo clusters in more detail.
Components and Ticket Management #
- Ticket
A ticket grants the right to run certain resources on a specific cluster site. A ticket can only be owned by one site at a time. Initially, none of the sites has a ticket—each ticket must be granted once by the cluster administrator. After that, tickets are managed by the booth for automatic failover of resources. But administrators may also intervene and grant or revoke tickets manually.
After a ticket is administratively revoked, it is not managed by booth anymore. For booth to start managing the ticket again, the ticket must be again granted to a site.
Resources can be bound to a certain ticket by dependencies. Only if the defined ticket is available at a site, the respective resources are started. Vice versa, if the ticket is removed, the resources depending on that ticket are automatically stopped.
The presence or absence of tickets for a site is stored in the CIB as a cluster status. With regard to a certain ticket, there are only two states for a site:
true(the site has the ticket) orfalse(the site does not have the ticket). The absence of a certain ticket (during the initial state of the Geo cluster) is not treated differently from the situation after the ticket has been revoked. Both are reflected by the valuefalse.A ticket within an overlay cluster is similar to a resource in a traditional cluster. But in contrast to traditional clusters, tickets are the only type of resource in an overlay cluster. They are primitive resources that do not need to be configured or cloned.
- Booth Cluster Ticket Manager
Booth is the instance managing the ticket distribution, and thus, the failover process between the sites of a Geo cluster. Each of the participating clusters and arbitrators runs a service, the
boothd. It connects to the booth daemons running at the other sites and exchanges connectivity details. After a ticket has been granted to a site, the booth mechanism can manage the ticket automatically: If the site that holds the ticket is out of service, the booth daemons will vote which of the other sites will get the ticket. To protect against brief connection failures, sites that lose the vote (either explicitly or implicitly by being disconnected from the voting body) need to relinquish the ticket after a time-out. Thus, it is made sure that a ticket will only be redistributed after it has been relinquished by the previous site. See also Dead Man Dependency (loss-policy="fence").- Arbitrator
Each site runs one booth instance that is responsible for communicating with the other sites. If you have a setup with an even number of sites, you need an additional instance to reach consensus about decisions such as failover of resources across sites. In this case, add one or more arbitrators running at additional sites. Arbitrators are single machines that run a booth instance in a special mode. As all booth instances communicate with each other, arbitrators help to make more reliable decisions about granting or revoking tickets. Arbitrators cannot hold any tickets.
An arbitrator is especially important for a two-site scenario: For example, if site
Acan no longer communicate with siteB, there are two possible causes for that:A network failure between
AandB.Site
Bis down.
However, if site
C(the arbitrator) can still communicate with siteB, siteBmust still be up and running.- Ticket Failover
If the ticket gets lost, which means other booth instances do not hear from the ticket owner in a sufficiently long time, one of the remaining sites will acquire the ticket. This is what is called ticket failover. If the remaining members cannot form a majority, then the ticket cannot fail over.
- Dead Man Dependency (
loss-policy="fence") After a ticket is revoked, it can take a long time until all resources depending on that ticket are stopped, especially in case of cascaded resources. To cut that process short, the cluster administrator can configure a
loss-policy(together with the ticket dependencies) for the case that a ticket gets revoked from a site. If the loss-policy is set tofence, the nodes that are hosting dependent resources are fenced.
Warning: Potential Loss of Data
On the one hand,
loss-policy="fence"considerably speeds up the recovery process of the cluster and makes sure that resources can be migrated more quickly.On the other hand, it can lead to loss of all unwritten data, such as:
Data lying on shared storage (for example, DRBD).
Data in a replicating database (for example, MariaDB or PostgreSQL) that has not yet reached the other site, because of a slow network link.
Figure 1: Two-Site Cluster (4 Nodes + Arbitrator) #
The most common scenario is probably a Geo cluster with two sites and a single arbitrator on a third site. This requires three booth instances, see Figure 1, “Two-Site Cluster (4 Nodes + Arbitrator)”. The upper limit is (currently) 16 booth instances.
As usual, the CIB is synchronized within each cluster, but it is not automatically synchronized across sites of a Geo cluster. However, as of SUSE Linux Enterprise High Availability Extension 12, transferring resource configurations to other cluster sites is easier than before. For details see Section 9.3, “Transferring the Resource Configuration to Other Cluster Sites”.
4 Requirements #
Software Requirements #
All clusters that will be part of the Geo cluster must be based on SUSE Linux Enterprise High Availability Extension 12.
SUSE® Linux Enterprise Server 12 must be installed on all arbitrators.
The Geo Clustering for SUSE Linux Enterprise High Availability Extension add-on must be installed on all cluster nodes and on all arbitrators that will be part of the Geo cluster.
Network Requirements #
The sites must be reachable on one UDP and TCP port per booth instance. That means any firewalls or IPsec tunnels in between must be configured accordingly.
Other setup decision may require to open more ports (for example, for DRBD or database replication).
Other Requirements and Recommendations #
All cluster nodes on all sites should synchronize to an NTP server outside the cluster. For more information, see the Administration Guide for SUSE Linux Enterprise Server 12, available at http://www.suse.com/documentation/. Refer to the chapter Time Synchronization with NTP.
If nodes are not synchronized, log files and cluster reports are very hard to analyze.
5 Example Scenario and Basic Steps—Overview #
In the following sections, we will use an example scenario as outlined below:
Example 1: Scenario with a Two-Site Cluster, one Arbitrator, and Data Replication via DRBD #
A Geo cluster with two sites,
amsterdamandberlin, and one arbitrator.Each site has a private network routed to the other site:
amsterdam:
192.168.201.xberlin:
192.168.202.x
Each site runs a two-node cluster:
cluster
amsterdamconsists of the nodesaliceandbobcluster
berlinconsists of the nodescharlyanddoro
Data is replicated across the sites with DRBD in asynchronous mode for disaster recovery.
The booth configuration and other important configuration files are synchronized across the cluster sites and the arbitrator using Csync2.
Setting up this scenario takes the following basic steps:
- Setting Up the Booth Services
Choosing whether to use the Default Booth Setup or a Booth Setup for Multiple Tenants.
Synchronizing the Booth Configuration to All Sites and Arbitrators.
Configuring the cluster resources for booth as explained in Ticket Dependencies, Constraints and Resources for booth.
Transferring the Resource Configuration to Other Cluster Sites.
- Setting Up DRBD
Configuring DRBD as described in DRBD Configuration.
Configuring the cluster resources for DRBD as explained in Resources and Constraints for DRBD.
Transferring the Resource Configuration to Other Cluster Sites.
Synchronizing DRBD configuration files as shown in Synchronizing Changes with Csync2.
- Synchronizing Configuration Files Across All Sites and Arbitrators
Setting up Csync2 as explained in Csync2 Setup for Geo Clusters.
Initially synchronizing all relevant configuration files across the sites and arbitrators as described in Synchronizing Changes with Csync2.
6 Setting Up the Booth Services #
The default booth configuration is /etc/booth/booth.conf. This file must be
the same on all sites of your Geo cluster, including the
arbitrator or arbitrators. To keep the booth configuration synchronous
across all sites and arbitrators, use Csync2, as described in Section 6.3, “Synchronizing the Booth Configuration to All Sites and Arbitrators”.
For setups including multiple Geo clusters, it is possible to “share” the same arbitrator (as of SUSE Linux Enterprise High Availability Extension 12). By providing several booth configuration files, you can start multiple booth instances on the same arbitrator, with each booth instance running on a different port. That way, you can use one machine to serve as arbitrator for different Geo clusters. For details on how to configure booth for multiple Geo clusters, refer to Section 6.2, “Booth Setup for Multiple Tenants”.
6.1 Default Booth Setup #
To configure all parameters needed for booth, either edit the booth
configuration files manually or by using the YaST module. To access the YaST module, start it from
command line with yast2
geo-cluster (or start
YaST and select › ).
Example 2: A Booth Configuration File #
transport = UDP 1 port = 9929 2 arbitrator = 147.2.207.14 3 site = 192.168.201.151 4 site = 192.168.202.151 4 ticket = "ticket-nfs" 5 expire = 600 6 timeout = 10 7 retries = 5 8 renewal-freq = 30 9 before-acquire-handler10 = /usr/share/booth/service-runnable11 ms_drbd_nfs12 acquire-after = 60 13 ticket = "ticketA" 5 expire = 600 6 timeout = 10 7 retries = 5 8 renewal-freq = 30 9 before-acquire-handler10 = /usr/share/booth/service-runnable11 db-1 12 acquire-after = 60 13 ticket = "ticketB" 5 expire = 600 6 timeout = 10 7 retries = 5 8 renewal-freq = 30 9 before-acquire-handler10 = /usr/share/booth/service-runnable11 db-8 12 acquire-after = 60 13
The transport protocol used for communication between the sites. Only UDP is supported, but other transport layers will follow in the future. Currently, this parameter can therefore be omitted. | |
The port to be used for communication between the booth instances at each
site. When not using the default port ( | |
The IP address of the machine to use as arbitrator. Add an entry for each arbitrator you use in your Geo cluster setup. | |
The IP address used for the | |
The ticket to be managed by booth. For each ticket, add a | |
Optional parameter. Defines the ticket's expiry time in seconds. A site that
has been granted a ticket will renew the ticket regularly. If booth does not receive any
information about renewal of the ticket within the defined expiry time, the ticket will be
revoked and granted to another site. If no expiry time is specified, the ticket will expire
after | |
Optional parameter. Defines a timeout period in seconds. After that time, booth will resend packets if it did not receive a reply within this period. The timeout defined should be long enough to allow packets to reach other booth members (all arbitrators and sites). | |
Optional parameter. Defines how many times booth retries sending packets before giving
up waiting for confirmation by other sites. Values smaller than | |
Optional parameter. Sets the ticket renewal frequency period. Ticket renewal occurs
every half expiry time by default. If the network reliability is often reduced over prolonged
periods, it is advisable to renew more often. Before every renewal the
| |
Optional parameter. If set, the specified command will be called before | |
The | |
The resource to be tested by the | |
Optional parameter. After a ticket is lost, booth will wait this time in addition before
acquiring the ticket. This is to allow for the site that lost the ticket to relinquish the
resources, by either stopping them or fencing a node. A typical delay might be
If you are unsure how long stopping or demoting the resources or fencing a node may take
(depending on the |
Procedure 1: Manually Editing The Booth Configuration File #
Log in to a cluster node as
rootor equivalent.Copy the example booth configuration file
/etc/booth/booth.conf.exampleto/etc/booth/booth.conf.Edit
/etc/booth/booth.confaccording to Example 2, “A Booth Configuration File”.Verify your changes and save the file.
On all cluster nodes and arbitrators, open the port in the firewall that you have configured for booth. See Example 2, “A Booth Configuration File”, position 2.
Procedure 2: Setting Up Booth with YaST #
Log in to a cluster node as
rootor equivalent.Start the YaST module.
Choose to an existing booth configuration file or click to create a new booth configuration file:
In the screen that appears configure the following parameters:
Configuration File. A name for the booth configuration file. YaST suggests
boothby default. This results in the booth configuration being written to/etc/booth/booth.conf. Only change this value if you need to set up multiple booth instances for different Geo clusters as described in Section 6.2, “Booth Setup for Multiple Tenants”.Transport. The transport protocol used for communication between the sites. Only UDP is supported, but other transport layers will follow in the future. See also Example 2, “A Booth Configuration File”, position 1.
Port. The port to be used for communication between the booth instances at each site. See also Example 2, “A Booth Configuration File”, position 2.
Arbitrator. The IP address of the machine to use as arbitrator. See also Example 2, “A Booth Configuration File”, position 3.
To specify an , click . In the dialog that opens, enter the IP address of your arbitrator and click .
Site. The IP address used for the
boothdon a site. See also Example 2, “A Booth Configuration File”, position 4.To specify a of your Geo cluster, click . In the dialog that opens, enter the IP address of one site and click .
Ticket. The ticket to be managed by booth. See also Example 2, “A Booth Configuration File”, position 5.
To specify a , click . In the dialog that opens, enter a unique name. If you need to define multiple tickets with the same parameters and values, save configuration effort by creating a “ticket template” that specifies the default parameters and values for all tickets. To do so, use
__default__as name.Additionally, you can specify optional parameters for your ticket. For an overview, see Example 2, “A Booth Configuration File”, positions 6 to 13.
Click to confirm your changes.
Figure 2: Example Ticket Dependency #
Click to close the current booth configuration screen. YaST shows the name of the booth configuration file that you have defined.
Before closing the YaST module, switch to the category.
To open the port you have configured for booth, enable .

Important: Firewall Setting for Local Machine Only
The firewall setting is only applied to the current machine. It will open the UDP/TCP ports for all ports that have been specified in
/etc/booth/booth.confor any other booth configuration files (see Section 6.2, “Booth Setup for Multiple Tenants”).Make sure to open the respective ports on all other cluster nodes and arbitrators of your Geo cluster setup, too. Do so either manually or by synchronizing the following files with Csync2:
/etc/sysconfig/SuSEfirewall2/etc/sysconfig/SuSEfirewall2.d/services/booth
Click to confirm all settings and close the YaST module. Depending on the NAME of the specified in Step 3.a, the configuration is written to
/etc/booth/NAME.conf.
6.2 Booth Setup for Multiple Tenants #
For setups including multiple Geo clusters, it is possible to “share” the same arbitrator (as of SUSE Linux Enterprise High Availability Extension 12). By providing several booth configuration files, you can start multiple booth instances on the same arbitrator, with each booth instance running on a different port. That way, you can use one machine to serve as arbitrator for different Geo clusters.
Let us assume you have two Geo clusters, one in EMEA (Europe, the Middle East and Africa), and one in the Asia-Pacific region (APAC).
To use the same arbitrator for both Geo clusters, create two
configuration files in the /etc/booth directory:
/etc/booth/emea.conf and
/etc/booth/apac.conf. Both must minimally differ in
the following parameters:
The port used for the communication of the booth instances.
The sites belonging to the different Geo clusters that the arbitrator is used for.
Example 3:
/etc/booth/apac.conf
#
transport = UDP 1 port = 9133 2 arbitrator = 147.2.207.14 3 site = 192.168.2.254 4 site = 192.168.1.112 4 ticket ="tkt-db-apac-intern" 5 timeout = 10 retries = 5 renewal-freq = 60 before-acquire-handler10 = /usr/share/booth/service-runnable11 db-apac-intern 12 ticket = "tkt-db-apac-cust" 5 timeout = 10 retries = 5 renewal-freq = 60 before-acquire-handler = /usr/share/booth/service-runnable db-apac-cust
Example 4:
/etc/booth/emea.conf
#
transport = UDP 1 port = 9150 2 arbitrator = 147.2.207.14 3 site = 192.168.201.151 4 site = 192.168.202.151 4 ticket = "tkt-sap-crm" 5 expire = 900 renewal-freq = 60 before-acquire-handler10 = /usr/share/booth/service-runnable11 sap-crm 12 ticket = "tkt-sap-prod" 5 expire = 600 renewal-freq = 60 before-acquire-handler = /usr/share/booth/service-runnable sap-prod
The transport protocol used for communication between the sites. Only UDP is supported, but other transport layers will follow in the future. Currently, this parameter can therefore be omitted. | |
The port to be used for communication between the booth instances at each site. The configuration files use different ports to allow for start of multiple booth instances on the same arbitrator. | |
The IP address of the machine to use as arbitrator. In the examples above, we use the same arbitrator for different Geo clusters. | |
The IP address used for the | |
The ticket to be managed by booth. Theoretically the same ticket names can be defined in different booth configuration files—the tickets will not interfere because they are part of different Geo clusters that are managed by different booth instances. However, (for better overview) we advise to use distinct ticket names for each Geo cluster as shown in the examples above. | |
Optional parameter. If set, the specified command will be called before | |
The | |
The resource to be tested by the |
Procedure 3: Using the Same Arbitrator for Different Geo Clusters #
Create different booth configuration files in
/etc/boothas shown in Example 3, “/etc/booth/apac.conf” and Example 4, “/etc/booth/emea.conf”. Do so either manually or with YaST, as outlined in Procedure 2, “Setting Up Booth with YaST”.On the arbitrator, open the ports that are defined in any of the booth configuration files in
/etc/booth.On the nodes belonging to the individual Geo clusters that the arbitrator is used for, open the port that is used for the respective booth instance.
Synchronize the respective booth configuration files across all cluster nodes and arbitrators that use the same booth configuration. For details, see Section 6.3, “Synchronizing the Booth Configuration to All Sites and Arbitrators”.
On the arbitrator, start the individual booth instances as described in Starting the Booth Services on Arbitrators for multi-tenancy setups.
On the individual Geo clusters, start the booth service as described in Starting the Booth Services on Cluster Sites.
6.3 Synchronizing the Booth Configuration to All Sites and Arbitrators #
Note: Use the Same Booth Configuration On All Sites and Arbitrators
To make booth work correctly, all cluster nodes and arbitrators within one Geo cluster must use the same booth configuration.
You can use Csync2 to synchronize the booth configuration. For details, see Section 8.1, “Csync2 Setup for Geo Clusters” and Section 8.2, “Synchronizing Changes with Csync2”.
In case of any booth configuration changes, make sure to update the configuration files accordingly on all parties and to restart the booth services as described in Section 6.5, “Reconfiguring Booth While Running”.
6.4 Enabling and Starting the Booth Services #
- Starting the Booth Services on Cluster Sites
The booth service for each cluster site is managed by the booth resource group configured in Procedure 7, “Configuring a Resource Group for
boothd”. To start one instance of the booth service per site, start the respective booth resource group on each cluster site.- Starting the Booth Services on Arbitrators
Starting with SUSE Linux Enterprise 12, booth arbitrators are managed with systemd. The unit file is named
booth@.service. The@denotes the possibility to run the service with a parameter, which is in this case the name of the configuration file.To enable the booth service on an arbitrator, use the following command:
root #systemctlenable booth@boothAfter the service has been enabled from command line, YaST System Services (Runlevel) can then be used to manage the service, as long as it is not disabled. In that case, it will disappear from the service list in YaST the next time systemd is restarted.
However, the command to start the booth service depends on your booth setup:
If you are using the default setup as described in Section 6.1, “Default Booth Setup”, only
/etc/booth/booth.confis configured. In that case, log in to each arbitrator and use the following command:root #systemctlstart booth@boothIf you are running booth in multi-tenancy mode as described in Section 6.2, “Booth Setup for Multiple Tenants”, you have configured multiple booth configuration files in
/etc/booth. To start the services for the individual booth instances, usesystemctl start booth@NAME, where NAME stands for the name of the respective configuration file/etc/booth/NAME.conf.For example, if you have the booth configuration files
/etc/booth/emea.confand/etc/booth/apac.conf, log in to your arbitrator and execute the following commands:root #systemctlstart booth@emearoot #systemctlstart booth@apac
This starts the booth service in arbitrator mode. It can communicate with all other booth daemons but in contrast to the booth daemons running on the cluster sites, it cannot be granted a ticket. Booth arbitrators take part in elections only. Otherwise, they are dormant.
6.5 Reconfiguring Booth While Running #
In case you need to change the booth configuration while the booth services are already running, proceed as follows:
Adjust the booth configuration files as desired.
Synchronize the updated booth configuration files to all cluster nodes and arbitrators that are part of your Geo cluster. For details, see Section 8, “Synchronizing Configuration Files Across All Sites and Arbitrators”.
Restart the booth services on the arbitrators and cluster sites as described in Section 6.4, “Enabling and Starting the Booth Services”. This does not have any effect on tickets that have already been granted to sites.
7 Setting Up DRBD #
For a description of the overall scenario, see Section 5, “Example Scenario and Basic Steps—Overview”. Assuming that you have two cluster sites that are connected with a routed IPv4 or IPv6 connection and a transmission speed ranging from a few Mbit/sec up to 10Gbit/sec, using a cluster file system across the sites will not be possible, because of the high latency. But you can use DRBD to replicate the data for a quick failover in case one of the sites goes down (active/passive setup). DRBD is a software for replicating storage data by mirroring the content of block devices (hard disks, partitions, logical volumes etc.) between hosts located on different sites. Failover is managed via the booth services, see Booth Cluster Ticket Manager.
7.1 DRBD Scenario and Basic Steps #
Figure 3, “DRBD Setup and Resources” shows a graphical representation of the setup and the resources that we will configure in the following.
Figure 3: DRBD Setup and Resources #
Scenario—Details #
A file system is to be served across the Geo cluster via NFS.
LVM is used as a storage layer below DRBD.
On site amsterdam, DRBD is running in protocol
C, a synchronous replication protocol. It uses local IP addresses in a LAN.The upper layer DRDB runs on one node per site, and is responsible for replicating the data to the other site of the Geo cluster.
The lower layer DRBD is responsible for the local replication of data (between the nodes of one cluster site). After activating one of the lower DRBD devices on one node per site, the service IP (to be configured as a cluster resource) will be started.
The service IP is not only used for the service as such, but also as a fixed point that can be accessed by the upper DRBD device (which runs in secondary state) for replication.
On the site that should run the file system service, the upper layer DRBD gets set
primary. This means that the file system therein can be mounted and used by applications.Optionally, the DRBD connection to the other site of the Geo cluster may use a DRBD Proxy in between.
For this setup scenario, you need to execute the following basic steps:
Edit the DRBD configuration files to include configuration snippets for each Geo cluster site and the DRBD connection across sites. For details, see the examples in Section 7.2, “DRBD Configuration”.
Configure the cluster resources as explained in Section 9.1, “Resources and Constraints for DRBD”.
Configure booth as described in Section 6, “Setting Up the Booth Services”.
Configure synchronization of DRBD and booth configuration files within each local cluster and across the Geo cluster sites. For details, refer to Section 8, “Synchronizing Configuration Files Across All Sites and Arbitrators”.
7.2 DRBD Configuration #
Beginning with DRBD 8.3, the DRBD configuration file is split into separate files. They must
be located in the /etc/drbd.d/ directory. The following DRBD configuration
snippets show a basic DRBD configuration for the scenario mentioned in Scenario—Details. All snippets can be added to a single DRBD resource
configuration file, for example, /etc/drbd.d/nfs.res. This file can then by
synchronized using Csync2 as described in Section 8.1, “Csync2 Setup for Geo Clusters”. Note that the DRBD configuration snippets below are bare-bones—they do not include
any performance tuning options or similar. For details on how to tune DRBD, see the
DRBD chapter in the SUSE Linux Enterprise High Availability Extension Administration Guide, available from
http://www.suse.com/documentation/.
Example 5: DRBD Configuration Snippet for Site 1 (amsterdam) #
resource nfs-lower-amsterdam 1{ disk /dev/volgroup/lv-nfs; 2 meta-disk internal; 3 device /dev/drbd0; 4 protocol C; 5 net { shared-secret "2a9702a6-8747-11e3-9ebb-782bcbd0c11c"; 6 } on alice { 7 address 192.168.201.111:7900; 8 } on bob { 7 address 192.168.201.112:7900; 8 } }
A resource name that allows some association to the respective service (here: NFS). By including the site name too, the complete DRBD configuration can be synchronized across the sites without causing name conflicts. | |
The device that is replicated between the nodes. In our example, LVM
is used as a storage layer below DRBD, and the volume group name is
| |
The meta-disk parameter usually contains the value
| |
The device name for DRBD and its minor number. To differentiate
between the lower layer DRBD for the local replication and the upper layer
DRBD for replication between the Geo cluster sites, the device minor
numbers | |
DRBD is running in protocol | |
A shared-secret is used to validate connection pairs. You need a different shared-secret for each connection pair. You can get unique values with the UUID program. | |
The | |
The local IP address and port number of the respective node. Each DRBD resource needs an individual port. |
Example 6: DRBD Configuration Snippet for Site 2 (berlin) #
The configuration for site 2 (berlin) is nearly identical to that for site 1: you can keep the values of most parameters, including the names of the volume group and the logical volume. However, the values of the following parameters need to be changed:
resource nfs-lower-berlin 1{ disk /dev/volgroup/lv-nfs; 2 meta-disk internal; 3 device /dev/drbd0; 4 protocol C; 5 net { shared-secret "2e9290a0-8747-11e3-a28c-782bcbd0c11c"; 6 } on charly { 7 address 192.168.202.111:7900; 8 } on doro { 7 address 192.168.202.112:7900; 8 } }
A resource name that allows some association to the respective service (here: NFS). By including the site name too, the complete DRBD configuration can be synchronized across the sites without causing name conflicts. | |
The device that is replicated between the nodes. In our example, LVM
is used as a storage layer below DRBD, and the volume group name is
| |
The meta-disk parameter usually contains the value
| |
The device name for DRBD and its minor number. To differentiate
between the lower layer DRBD for the local replication and the upper layer
DRBD for replication between the Geo cluster sites, the device minor
numbers | |
DRBD is running in protocol | |
A shared-secret is used to validate connection pairs. You need a different shared-secret for each connection pair. You can get unique values with the UUID program. | |
The | |
The local IP address and port number of the respective node. Each DRBD resource needs an individual port. |
Example 7: DRBD Configuration Snippet for Connection Across Sites #
resource nfs-upper 1 { disk /dev/drbd0; 2 meta-disk internal; device /dev/drbd10; 3 protocol A; 4 net { shared-secret "3105dd88-8747-11e3-a7fd-782bcbd0c11c"; 5 ping-timeout 20; 6 } stacked-on-top-of nfs-lower-amsterdam { 7 address 192.168.201.151:7910; 8 } stacked-on-top-of nfs-lower-berlin { 7 address 192.168.202.151:7910; 8 } }
A resource name that allows some association to the respective service (here: NFS). This is the configuration for the upper layer DRBD, responsible for replicating the data to the other site of the Geo cluster. | |
The storage disk to replicate is the DRBD device, | |
The device name for DRBD and its minor number. To differentiate
between the lower layer DRBD for the local replication and the upper layer
DRBD for replication between the Geo cluster sites, the device minor
numbers | |
DRBD is running in protocol | |
Because of the higher latency, set the ping-timeout to | |
A shared-secret is used to validate connection pairs. You need a different shared-secret for each connection pair. You can get unique values with the UUID program. | |
Instead of passing any host names, we tell DRBD to stack upon its lower device. This
implies that the lower device must be | |
To allow TCP/IP connection to the other site of the Geo cluster without knowing
which cluster node has the lower DRBD device |
8 Synchronizing Configuration Files Across All Sites and Arbitrators #
To replicate important configuration files across all nodes in the cluster and across
Geo clusters, use Csync2.
Csync2 can handle any number of hosts, sorted into synchronization groups. Each
synchronization group has its own list of member hosts and its include/exclude patterns that
define which files should be synchronized in the synchronization group. The groups, the host names
belonging to each group, and the include/exclude rules for each group are specified in the
Csync2 configuration file, /etc/csync2/csync2.cfg.
For authentication, Csync2 uses the IP addresses and pre-shared keys within a synchronization group. You need to generate one key file for each synchronization group and copy it to all group members.
Csync2 will contact other servers via a TCP port (by default 6556),
and uses xinetd to start remote Csync2 instances. For detailed information about Csync2,
refer to http://oss.linbit.com/csync2/paper.pdf
8.1 Csync2 Setup for Geo Clusters #
How to set up Csync2 for individual clusters with YaST is explained in the Administration Guide for SUSE Linux Enterprise High Availability Extension, chapter Installation and Basic Setup, section Transferring the Configuration to All Nodes. However, YaST cannot handle more complex Csync2 setups, like those that are needed for Geo clusters. For the following setup, as shown in Figure 4, “Example Csync2 Setup for Geo Clusters”, configure Csync2 manually by editing the configuration files.
To adjust Csync2 for synchronizing files not only within local clusters but also across geographically dispersed sites, you need to define two synchronization groups in the Csync2 configuration:
A global group
ha_global(for the files that need to be synchronized globally, across all sites and arbitrators belonging to a Geo cluster).A group for the local cluster site
ha_local(for the files that need to be synchronized within the local cluster).
For an overview of the multiple Csync2 configuration files for the two synchronization groups, see Figure 4, “Example Csync2 Setup for Geo Clusters”.
Figure 4: Example Csync2 Setup for Geo Clusters #
Authentication key files and their references are displayed in red. The names of Csync2 configuration files are displayed in blue, and their references are displayed in green. For details, refer to Example Csync2 Setup: Configuration Files.
Example Csync2 Setup: Configuration Files #
/etc/csync2/csync2.cfgThe main Csync2 configuration file. It is kept short and simple on purpose and only contains the following:
The definition of the synchronization group
ha_local. The group consists of two nodes (this-site-host-1andthis-site-host-2) and uses/etc/csync2/ha_local.keyfor authentication. A list of files to be synchronized for this group only is defined in another Csync2 configuration file,/etc/csync2/ha_local.cfg. It is included with theconfigstatement.A reference to another Csync2 configuration file,
/etc/csync2.cfg/ha_global.cfg, included with theconfigstatement.
/etc/csync2/ha_local.cfgThis file concerns only the local cluster. It specifies a list of files to be synchronized only within the
ha_localsynchronization group, as these files are specific per cluster. The most important ones are the following:/etc/csync2/csync2.cfg, as this file contains the list of the local cluster nodes./etc/csync2/ha_local.key, the authentication key to be used for Csync2 synchronization within the local cluster./etc/corosync/corosync.conf, as this file defines the communication channels between the local cluster nodes./etc/corosync/authkey, the Corosync authentication key.
The rest of the file list depends on your specific cluster setup. The files listed in Figure 4, “Example Csync2 Setup for Geo Clusters” are only examples. If you also want to synchronize files for any site-specific applications, include them in
ha_local.cfg, too. Even thoughha_local.cfgis targeted at the nodes belonging to one site of your Geo cluster, the content may be identical on all sites. If you need different sets of hosts or different keys, adding extra groups may be necessary.-
/etc/csync2.cfg/ha_global.cfg This file defines the Csync2 synchronization group
ha_global. The group spans all cluster nodes across multiple sites, including the arbitrator. As it is recommended to use a separate key for each Csync2 synchronization group, this group uses/etc/csync2/ha_global.keyfor authentication. Theincludestatements define the list of files to be synchronized within theha_globalsynchronization group. The most important ones are the following:/etc/csync2/ha_global.cfgand/etc/csync2/ha_global.key(the configuration file for theha_globalsynchronization group and the authentication key used for synchronization within the group)./etc/booth/booth.conf, the default booth configuration file. In case you are using a booth setup for multiple tenants, replace this file with the different booth configuration files that you have created. See Section 6.2, “Booth Setup for Multiple Tenants” for details./etc/drbd.confand/etc/drbd.d(if you are using DRBD within your cluster setup). The DRBD configuration can be globally synchronized, as it derives the configuration from the host names contained in the resource configuration file./etc/zypp/repos.de. The package repositories are likely to be the same on all cluster nodes.
The other files shown (
/etc/root/*) are examples that may be included for reasons of convenience (to make a cluster administrator's life easier).
Note
The files csync2.cfg and ha_local.key are
site-specific, which means you need to create different ones for each cluster site. The files are
identical on the nodes belonging to the same cluster but different on another cluster. Each
csync2.cfg file needs to contain a lists of hosts (cluster nodes)
belonging to the site, plus a site-specific authentication key.
The arbitrator needs a csync2.cfg file, too. It only needs to
reference ha_global.cfg though.
8.2 Synchronizing Changes with Csync2 #
To successfully synchronize the files with Csync2, the following prerequisites must be met:
The same Csync2 configuration is available on all machines that belong to the same synchronization group.
The Csync2 authentication key for each synchronization group must be available on all members of that group.
Both Csync2 and
xinetdmust be running on all nodes and the arbitrator.
Before the first Csync2 run, you therefore need to make the following preparations:
Log in to one machine per synchronization group and generate an authentication key for the respective group:
csync2 -k NAME_OF_KEYFILE
However, do not regenerate the key file on any other member of the same group.
With regard to Figure 4, “Example Csync2 Setup for Geo Clusters”, this would result in the following key files:
/etc/csync2/ha_global.keyand one local key (/etc/csync2/ha_local.key) per site.Copy each key file to all members of the respective synchronization group. With regard to Figure 4, “Example Csync2 Setup for Geo Clusters”:
Copy
/etc/csync2/ha_global.keyto all parties (the arbitrator and all cluster nodes on all sites of your Geo cluster). The key file needs to be available on all hosts listed within theha_globalgroup that is defined inha_global.cfg.Copy the local key file for each site (
/etc/csync2/ha_local.key) to all cluster nodes belonging to the respective site of your Geo cluster.
Copy the site-specific
/etc/csync2/csync2.cfgconfiguration file to all cluster nodes belonging to the respective site of your Geo cluster and to the arbitrator.Execute the following commands on all nodes and the arbitrator to make both xinetd and csync2 services start automatically at boot time:
root #systemctlenable csync2.socketroot #systemctlenable xinetd.serviceExecute the following commands on all nodes and the arbitrator to start both services now:
root #systemctlstart csync2.socketroot #systemctlstart xinetd.service
Procedure 4: Synchronizing Files with Csync2 #
To initially synchronize all files once, execute the following command on the machine that you want to copy the configuration from:
root #csync2-xvThis will synchronize all the files once by pushing them to the other members of the synchronization groups. If all files are synchronized successfully, Csync2 will finish with no errors.
If one or several files that are to be synchronized have been modified on other machines (not only on the current one), Csync2 will report a conflict. You will get an output similar to the one below:
While syncing file /etc/corosync/corosync.conf: ERROR from peer site-2-host-1: File is also marked dirty here! Finished with 1 errors.
If you are sure that the file version on the current machine is the “best” one, you can resolve the conflict by forcing this file and resynchronizing:
root #csync2-f/etc/corosync/corosync.confroot #csync2-x
For more information on the Csync2 options, run
csync2 .
-help
Note: Pushing Synchronization After Any Changes
Csync2 only pushes changes. It does not continuously synchronize files between the machines.
Each time you update files that need to be synchronized, you need to
push the changes to the other machines of the same synchronization group: Run
csync2 on the machine where
you did the changes. If you run the command on any of the other machines
with unchanged files, nothing will happen.
-xv
9 Configuring Cluster Resources and Constraints #
Apart from the resources and constraints that you need to define for your specific cluster setup, Geo clusters require additional resources and constraints as described below. You can either configure them with the crm shell (crmsh) as demonstrated in the examples below, or with the HA Web Konsole (Hawk).
This section focuses on tasks specific to Geo clusters. For an introduction to your preferred cluster management tool and general instructions on how to configure resources and constraints with it, refer to one of the following chapters in the Administration Guide for SUSE Linux Enterprise High Availability Extension, available from http://www.suse.com/documentation/:
Hawk: Chapter Configuring and Managing Cluster Resources (Web Interface)
crmsh: Chapter Configuring and Managing Cluster Resources (Command Line)
Important: No CIB Synchronization Across Sites
The CIB is not automatically synchronized across cluster sites of a Geo cluster. This means you need to configure all resources that must be highly available across the Geo cluster for each site accordingly.
To ease transfer of the configuration to other cluster sites, any resources with site-specific parameters can be configured in such a way that the parameters' values depend on the name of the cluster site where the resource is running.
To make this work, the cluster names for each site must be defined in
the respective /etc/corosync/corosync.conf files. For example, /etc/corosync/corosync.conf of
site 1 (amsterdam) must contain the following
entry:
totem {
[...]
cluster_name: amsterdam
}After you have configured the resources on one site, you can tag the resources that are needed on all cluster sites, export them from the current CIB, and import them into the CIB of another cluster site. For details, see Section 9.3, “Transferring the Resource Configuration to Other Cluster Sites”.
9.1 Resources and Constraints for DRBD #
To complete the DRBD setup, you need to configure some resources and constraints as shown in Procedure 5 and transfer them to the other cluster sites as explained in Section 9.3, “Transferring the Resource Configuration to Other Cluster Sites”.
Procedure 5: Configuring Resources for a DRBD Setup #
On one of the nodes of cluster
amsterdam, start a shell and log in asrootor equivalent.Enter
crm configureto switch to the interactive crm shell.Configure the (site-dependent) service IP for NFS as a basic primitive:
crm(live)configure#primitiveip_nfs ocf:heartbeat:IPaddr2 \ params iflabel="nfs" nic="eth1" cidr_netmask="24" params rule #cluster-name eq amsterdam ip="192.168.201.151" \ params rule #cluster-name eq berlin ip="192.168.202.151" \ op monitor interval=10Configure a file system resource and a resource for the NFS server:
primitivenfs_fs ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/nfs/0" directory="/mnt/nfs" \ fstype="ext4"primitivenfs_service lsb:nfsConfigure the following primitives and multi-state resources for DRBD:
crm(live)configure#primitivedrbd_nfs ocf:linbit:drbd \ params drbd_resource="nfs-upper" \ op monitor interval="31" role="Slave" \ op monitor interval="30" role="Master"primitivedrbd_nfs_lower ocf:linbit:drbd \ params rule #cluster-name eq amsterdam drbd_resource="nfs-lower-amsterdam" \ params rule #cluster-name eq berlin drbd_resource="nfs-lower-berlin" \ op monitor interval="31" role="Slave" \ op monitor interval="30" role="Master"msms_drbd_nfs drbd_nfs \ meta master-max="1" master-node-max="1" \ clone-max="1" clone-node-max="1" notify="true"msms_drbd_nfs_lower drbd_nfs_lower \ meta master-max="1" master-node-max="1" \ clone-max="2" clone-node-max="1" notify="true"Add a group with the following colocation and ordering constraints:
crm(live)configure#groupg_nfs nfs_fs nfs_servicecolocationcol_nfs_ip_with_lower inf: ip_nfs:Started ms_drbd_nfs_lower:Mastercolocationcol_nfs_g_with_upper inf: g_nfs:Started ms_drbd_nfs:Mastercolocationcol_nfs_upper_with_ip inf: ms_drbd_nfs:Master ip_nfs:Startedordero_lower_drbd_before_ip_nfs inf: ms_drbd_nfs_lower:promote ip_nfs:startordero_ip_nfs_before_drbd inf: ip_nfs:start ms_drbd_nfs:promoteordero_drbd_nfs_before_svc inf: ms_drbd_nfs:promote g_nfs:startReview your changes with
show.If everything is correct, submit your changes with
commitand leave the crm live configuration withexit.The configuration is saved to the CIB.
9.2 Ticket Dependencies, Constraints and Resources for booth #
To complete the booth setup, you need to execute the following steps to configure the resources and constraints needed for booth and failover of resources:
The resource configurations need to be available on each of the cluster sites. Transfer them to the other sites as described in Section 9.3, “Transferring the Resource Configuration to Other Cluster Sites”.
Procedure 6: Configuring Ticket Dependencies of Resources #
For Geo clusters, you can specify which resources depend on a
certain ticket. Together with this special type of constraint, you can
set a loss-policy that defines what should happen to
the respective resources if the ticket is revoked. The attribute
loss-policy can have the following values:
fence: Fence the nodes that are running the relevant resources.stop: Stop the relevant resources.freeze: Do nothing to the relevant resources.demote: Demote relevant resources that are running inmastermode toslavemode.
On one of the nodes of cluster amsterdam, start a shell and log in as
rootor equivalent.Enter
crm configureto switch to the interactive crm shell.Configure constraints that define which resources depend on a certain ticket. For example, we need the following constraint for the DRBD scenario outlined in Section 7.1, “DRBD Scenario and Basic Steps”:
crm(live)configure#rsc_ticketnfs-req-ticket-nfs ticket-nfs: ms_drbd_nfs:Master \ loss-policy=demoteThis command creates a constraint with the ID
nfs-req-ticket-nfs. It defines that the multi-state resourcems_drbd_nfsdepends onticket-nfs. However, only the resource's master mode depends on the ticket. In caseticket-nfsis revoked,ms_drbd_nfsis automatically demoted toslavemode, which in return will put DRBD intoSecondarymode. That way, it is ensured that DRBD replication is still running, even if a site does not have the ticket.If you want other resources to depend on further tickets, create as many constraints as necessary with
rsc_ticket.Review your changes with
show.If everything is correct, submit your changes with
commitand leave the crm live configuration withexit.The configuration is saved to the CIB.
Example 8: Ticket Dependency for Primitives #
Here is another example for a constraint that makes a primitive
resource rsc1 depend on
ticketA:
crm(live)configure#rsc_ticketrsc1-req-ticketA ticketA: rsc1 loss-policy="fence"
In case ticketA is revoked, the node running
the resource should be fenced.
Procedure 7: Configuring a Resource Group for boothd #
Each site needs to run one instance of
boothd that communicates
with the other booth daemons. The daemon can be started on any node,
therefore it should be configured as primitive resource. To make the
boothd resource stay on the same node, if
possible, add resource stickiness to the configuration. As each daemon
needs a persistent IP address, configure another primitive with a
virtual IP address. Group both primitives:
On one of the nodes of cluster
amsterdam, start a shell and log in asrootor equivalent.Enter
crm configureto switch to the interactive crm shell.Enter the following to create both primitive resources and to add them to one group,
g-booth:crm(live)configure#primitiveip-booth ocf:heartbeat:IPaddr2 \ params iflabel="ha" nic="eth1" cidr_netmask="24" params rule #cluster-name eq amsterdam ip="192.168.201.151" \ params rule #cluster-name eq berlin ip="192.168.202.151"primitivebooth ocf:pacemaker:booth-site \ meta resource-stickiness="INFINITY" \ params config="nfs" op monitor interval="10s"groupg-booth ip-booth boothWith this configuration, each booth daemon will be available at its individual IP address, independent of the node the daemon is running on.
Review your changes with
show.If everything is correct, submit your changes with
commitand leave the crm live configuration withexit.The configuration is saved to the CIB.
Procedure 8: Adding an Ordering Constraint #
If a ticket has been granted to a site but all nodes of that site
should fail to host the boothd
resource group for any reason, a “split-brain” situation
among the geographically dispersed sites may occur. In that case, no
boothd instance would be
available to safely manage failover of the ticket to another site. To
avoid a potential concurrency violation of the ticket (the ticket is
granted to multiple sites simultaneously), add an ordering constraint:
On one of the nodes of cluster amsterdam, start a shell and log in as
rootor equivalent.Enter
crm configureto switch to the interactive crm shell.Create an ordering constraint:
crm(live)configure#ordero-booth-before-nfs inf: g-booth ms_drbd_nfs:promoteThe ordering constraint
o-booth-before-nfsdefines that the resourcems_drbd_nfscan only be promoted to master mode after theg-boothresource group has started.For any other resources that depend on a certain ticket, define further ordering constraints.
Review your changes with
show.If everything is correct, submit your changes with
commitand leave the crm live configuration withexit.The configuration is saved to the CIB.
Example 9: Ordering Constraint for Primitives #
If the resource that depends on a certain ticket is not a multi-state resource, but a primitive, the ordering constraint would look like the following:
crm(live)configure#ordero-booth-before-rsc1 inf: g-booth rsc1
It defines that rsc1 (which depends on
ticketA) can only be started after the
g-booth resource group.
9.3 Transferring the Resource Configuration to Other Cluster Sites #
If you have configured resources for one cluster site as described in Section 9.1 and Section 9.2, you are not done yet. You need to transfer the resource configuration to the other sites of your Geo cluster.
To ease the transfer, you can tag any resources that are needed on all cluster sites, export them from the current CIB, and import them into the CIB of another cluster site. Procedure 9, “Transferring the Resource Configuration to Other Cluster Sites” gives an example of how to do so. It is based on the following prerequisites:
Prerequisites #
You have a Geo cluster with two sites: cluster
amsterdamand clusterberlin.The cluster names for each site are defined in the respective
/etc/corosync/corosync.conffiles:totem { [...] cluster_name: amsterdam }This can either be done manually (by editing
/etc/corosync/corosync.conf) or with the YaST cluster module as described in the Administration Guide for SUSE Linux Enterprise High Availability Extension 12, available at http://www.suse.com/documentation/. Refer to the chapter Installation and Basic Setup, procedure Defining the First Communication Channel.You have configured the necessary resources for DRBD and booth as described in Section 9.1, “Resources and Constraints for DRBD” and Section 9.2, “Ticket Dependencies, Constraints and Resources for booth”.
Procedure 9: Transferring the Resource Configuration to Other Cluster Sites #
Log in to one of the nodes of cluster
amsterdam.Start the cluster with:
root #systemctlstart pacemaker.serviceEnter
crm configureto switch to the interactive crm shell.Tag the resources and constraints that are needed across the Geo cluster:
Review the current CIB configuration:
crm(live)configure#showEnter the following command to group the Geo cluster-related resources with the tag
geo_resources:crm(live)configure#taggeo_resources: \ ip_nfs nfs_fs nfs_service drbd_nfs drbd_nfs_lower ms_drbd_nfs ms_drbd_nfs_lower g_nfs 1\ col_nfs_ip_with_lower col_nfs_g_with_upper col_nfs_upper_with_ip 1\ o_lower_drbd_before_ip_nfs o_ip_nfs_before_drbd o_drbd_nfs_before_svc 1\ nfs-req-ticket-nfs ip-booth booth g-booth o-booth-before-nfs 2 [...] 3Tagging does not create any colocation or ordering relationship between the resources.
Resources and constraints for DRBD, see Section 9.1, “Resources and Constraints for DRBD”.
Resources and constraints for boothd, see Section 9.2, “Ticket Dependencies, Constraints and Resources for booth”.
Any other resources of your specific setup that you need on all sites of the Geo cluster.
Review your changes with
show.If the configuration is according to your wishes, submit your changes with
submitand leave the crm live shell withexit.
Export the tagged resources and constraints to a file named
exported.cib:crm configure showtag:geo_resources geo_resources > exported.cibThe command
crm configure show tag:TAGNAMEshows all resources that belong to the tag TAGNAME.Log in to one of the nodes of cluster
berlinand proceed as follows:Start the cluster with:
root #systemctlstart pacemaker.serviceCopy the file
exported.cibfrom clusteramsterdamto this node.Import the tagged resources and constraints from the file
exported.cibinto the CIB of clusterberlin:crm configure loadupdate PATH_TO_FILE/exported.cibWhen using the
updateparameter for thecrm configure loadcommand, crmsh tries to integrate the contents of the file into the current CIB configuration (instead of replacing the current CIB with the file contents).View the updated CIB configuration with the following command:
crm configure show
The imported resources and constraints will appear in the CIB.
This configuration will result in the following:
When granting
ticket-nfsto clusteramsterdam, the node hosting the resourceip_nfswill get the IP address192.168.201.151.When granting
ticket-nfsto clusterberlin, the node hosting the resourceip_nfswill get the IP address192.168.202.151.
Example 10: Referencing Site-Dependent Parameters in Resources #
Based on the example in Procedure 5, you can also create resources that reference
site-specific parameters of another resource, for example, the IP parameters
of ip_nfs. Proceed as follows:
On cluster
amsterdamcreate a dummy resource that references the IP parameters ofip_nfsand uses them as the value of itsstateparameter:crm(live)configure#primitive dummy1 ocf:pacemaker:Dummy \ params rule #cluster-name eq amsterdam \ @ip_nfs-instance_attributes-0-ip:state \ params rule #cluster-name eq berlin \ @ip_nfs-instance_attributes-1-ip:state \ op monitor interval=10Add a constraint to make the
dummy1resource depend onticket-nfs, too:crm(live)configure#rsc_ticketdummy1-dep-ticket-nfs \ ticket-nfs: dummy1 loss-policy=stopTag the resource and the constraint:
crm(live)configure#taggeo_resources_2: dummy1 dummy1-dep-ticket-nfsReview your changes with
show, submit your changes withsubmit, and leave the crm live shell withexit.Export the resources tagged with
geo_resources_2from clusteramsterdamand import them into the CIB of clusterberlin, similar to Step 5 through Step 6.d of Procedure 9.
This configuration will result in the following:
When granting
ticket-nfsto clusteramsterdam, the following file will be created on the node hosting thedummyresource:/var/lib/heartbeat/cores/192.168.201.151.When granting
ticket-nfsto clusterberlin, the following file will be created on the node hosting thedummyresource:/var/lib/heartbeat/cores/192.168.202.151.
10 Setting Up IP Relocation via DNS Update #
In case one site of your Geo cluster is down and a ticket failover appears, you usually need to adjust the network routing accordingly (or you need to have configured a network failover for each ticket). Depending on the kind of service that is bound to a ticket, there is an alternative solution to reconfiguring the routing: You can use dynamic DNS update and instead change the IP address for a service.
The following prerequisites must be fulfilled for this scenario:
The service that needs to fail over is bound to a host name.
Your DNS server must be configured for dynamic DNS updates. For information on how to do so with BIND/named, see the named documentation, or refer to http://www.semicomplete.com/articles/dynamic-dns-with-dhcp/. More information on how to set up DNS, including dynamic update of zone data, can be found in the chapter The Domain Name System of the SUSE Linux Enterprise Administration Guide. It is available from http://www.suse.com/documentation/.
The following example assumes that the DNS updates are protected by a shared key (TSIG key) for the zone to be updated. The key can be created using
dnssec-keygen:root #dnssec-keygen -a hmac-md5 -b 128 -n USER geo-updateFor more information, see the
dnssec-keygenman page or the SUSE Linux Enterprise Administration Guide, available from http://www.suse.com/documentation/. Refer to chapter The Domain Name System, section Secure Transactions.
Example 11, “Resource Configuration for Dynamic DNS Update” illustrates how to use the
ocf:heartbeat:dnsupdate resource agent to manage the
nsupdate command. The resource agent supports both IPv4 and IPv6.
Example 11: Resource Configuration for Dynamic DNS Update #
primitive dns-update-ip ocf:heartbeat:dnsupdate params \ hostname="www.domain.com"1 ip="192.168.3.4"2\ keyfile="/etc/whereever/Kgeo-update*.key"3\ server="192.168.1.1"4 serverport="53"5
Host name bound to the service that needs to fail over together with the ticket. The IP address of this host name needs to be updated via dynamic DNS. | |
IP address of the server hosting the service to be migrated. The IP address specified here can be under cluster control, too. This does not handle local failover, but it ensures that outside parties will be directed to the right site after a ticket failover. | |
Path to the public key file generated with | |
IP address of the DNS server to send the updates to. If no server is provided, this defaults to the master server for the correct zone. | |
Port to use for communication with the DNS server. This option will only take effect if a DNS server is specified. |
With the resource configuration above, the resource agent takes care of removing the failed Geo cluster site from the DNS record and changing the IP for a service via dynamic DNS update.
11 Managing Geo Clusters #
Before booth can manage a certain ticket within the Geo cluster, you initially need to grant it to a site manually.
11.1 From Command Line #
Use the booth client command line tool to grant, list, or
revoke tickets as described in Overview of booth client Commands. The
booth client commands can be run on any machine in the cluster, not
only the ones having the boothd running. The booth client
commands try to find the “local” cluster by looking at the booth configuration file
and the locally defined IP addresses. If you do not specify a site which the booth client should
connect to (using the -s option), it will always connect to the local site.
Note: Syntax Changes
The syntax of booth client commands has been simplified since SUSE Linux Enterprise High Availability Extension 11. For
example, the client keyword can be omitted for list,
grant, or revoke operations: booth list.
Also, the -t option can be omitted when specifying a ticket.
The former syntax is still supported. For detailed information, see the
Synopsis section in the booth man page. However, the examples in this
manual use the simplified syntax.
Overview of booth client Commands #
- Listing All Tickets
root #boothlist ticket: ticketA, leader: none ticket: ticketB, leader: 10.2.12.101, expires: 2014-08-13 10:28:57If you do not specify a certain site with
-s, the information about the tickets will be requested from the local booth instance.- Granting a Ticket to a Site
root #boothgrant -s 192.168.201.151 ticketA booth[27891]: 2014/08/13_10:21:23 info: grant request sent, waiting for the result ... booth[27891]: 2014/08/13_10:21:23 info: grant succeeded!In this case,
ticketAwill be granted to the site192.168.201.151. If you omit the-soption, booth will automatically connect to the current site (the site you are running the booth client on) and will request thegrantoperation.Before granting a ticket, the command will execute a sanity check. If the same ticket is already granted to another site, you will be warned about that and be prompted to revoke the ticket from the current site first.
- Revoking a Ticket From a Site
root #boothrevoke ticketA booth[27900]: 2014/08/13_10:21:23 info: revoke succeeded!Booth will check to which site the ticket is currently granted and will request the
revokeoperation forticketA. The revoke operation will be executed immediately.
The grant and, under certain circumstances, revoke
operations may take a while to return a definite operation's outcome. The client will wait for
the result up to the ticket's timeout value before it gives up
waiting—unless the -w option was used, in which case the client waits
indefinitely. Find the exact status in the log files or with the
crm_ticket -L command.
Warning: crm_ticket and
crm site ticket
In case the booth service is not running for any reasons, you may
also manage tickets manually with crm_ticket or
crm site ticket. Both commands are
only available on cluster nodes. In case of intervention, use them
with great care as they cannot verify if the same
ticket is already granted elsewhere. For more information, read the man pages.
As long as booth is up and running, only use
booth client for manual intervention.
After you have initially granted a ticket to a site, the booth
mechanism will take over and manage the ticket automatically. If the site
holding a ticket should be out of service, the ticket will automatically be
revoked after the expiry time and granted to another site. The resources
that depend on that ticket will fail over to the new site holding the
ticket. The nodes that have run the resources before will be treated
according to the loss-policy you set within the
constraint.
Procedure 10: Managing Tickets Manually #
Assuming that you want to manually move ticket-nfs from site
amsterdam (with the virtual IP 192.168.201.151) to
site berlin (with the virtual IP 192.168.202.151),
proceed as follows:
Set
ticket-nfsto standby with the following command:root #crm_ticket-t ticket-nfs -sWait for any resources that depend on
ticket-nfsto be stopped or demoted cleanly.Revoke
ticket-nfsfrom siteamsterdamwith:root #boothrevoke -s 192.168.201.151 ticket-nfsAfter the ticket has been revoked from its original site, grant it to the site
berlinwith:booth grant -s 192.168.202.151 ticket-nfs
11.2 With the HA Web Konsole (Hawk) #
You can use Hawk as a single point of administration for monitoring multiple clusters. Hawk's allows you to view a summary of multiple clusters, with each summary listing the number of nodes, resources, tickets, and their state. The summary also shows whether any failures have appeared in the respective cluster.
To manage cluster site tickets and to test the impact of ticket failover with the , you can switch from the to other Hawk functions that are available after logging in to an individual cluster. Hawk allows you to grant or revoke tickets, to view ticket details, and to test the impact of ticket failover with the .
11.2.1 Monitoring Multiple Clusters with the Cluster Dashboard #
The cluster information displayed in the is stored in a persistent cookie. This means you need to decide which Hawk instance you want to view the on, and always use that one. The machine you are running Hawk on does not even need to be part of any cluster for that purpose—it can be a separate, unrelated system.
Procedure 11: Monitoring Multiple Clusters with Hawk #
Prerequisites #
All clusters to be monitored from Hawk's must be running SUSE Linux Enterprise High Availability Extension 12. It is not possible to monitor clusters that are running earlier versions of SUSE Linux Enterprise High Availability Extension.
If you did not replace the self-signed certificate for Hawk on every cluster node with your own certificate (or a certificate signed by an official Certificate Authority), log in to Hawk on every node in every cluster at least once. Verify the certificate (and add an exception in the browser to bypass the warning).
If you are using Mozilla Firefox, you must change its preferences to . Otherwise cookies from monitored clusters will not be set, thus preventing login to the clusters you are trying to monitor.
Start the Hawk Web service on a machine you want to use for monitoring multiple clusters.
Start a Web browser and as URL enter the IP address or host name of the machine that runs Hawk:
https://IPaddress:7630/
On the Hawk login screen, click the link in the right upper corner.
The dialog appears.
Enter a custom with which to identify the cluster in the .
Enter the of one of the cluster nodes and confirm your changes.
The opens and shows a summary of the cluster that you have added.
To add more clusters to the dashboard, click the plus icon and enter the details for the next cluster.
Figure 5: Hawk—Cluster Dashboard #
To remove a cluster from the dashboard, click the
xicon next to the cluster's summary.To view more details about a cluster, click somewhere in the cluster's box on the dashboard.
This opens a new browser window or new browser tab. If you are not currently logged in to the cluster, this takes you to the Hawk login screen. After having logged in, Hawk shows the of that cluster in the summary view. From here, you can administer the cluster with Hawk as usual.
As the stays open in a separate browser window or tab, you can easily switch between the dashboard and the administration of individual clusters in Hawk.
Any status changes for nodes or resources are reflected almost immediately within the .
11.2.2 Managing Tickets with Hawk #
Note: Granting Tickets to Current Site
Though you can view tickets for all sites with Hawk, any grant operations triggered by Hawk only apply to the current site (that you are currently connected to with Hawk). To grant a ticket to another site of your Geo cluster, start Hawk on one of the cluster nodes belonging to the respective site.
Procedure 12: Granting, Revoking and Viewing with Hawk #
Tickets are visible in Hawk if they have been granted or revoked at
least once or if they are referenced in a ticket dependency. In case a
ticket is referenced in a ticket dependency, but has not been granted to
any site yet, Hawk displays it as revoked.
Start a Web browser and log in to the cluster.
In the left navigation bar, select .
Switch to the or the to view tickets. Along with information about cluster nodes and resources, Hawk also displays a category.
It shows the following information:
: Tickets that are granted to the current site.
: Tickets that are granted to another site.
: Tickets that have been revoked.
Figure 6: Hawk Cluster Status (Summary View)—Ticket Overview #
To view more details, either click the title of the category or the individual ticket entries that are marked as links. Hover the cursor over the information icon next to the ticket to display the following information: time when the ticket was last granted, the leader, and the ticket expiry date.
Figure 7: Hawk Cluster Status (Summary View)—Ticket Details #
To revoke a ticket, click the wrench icon next to the ticket and select . Confirm your choice when Hawk prompts for a confirmation.
If the ticket cannot be revoked for any reason, Hawk shows an error message. After the ticket has been successfully revoked, Hawk will update the ticket status in the category.
You can only grant tickets that are not already given to any site. To grant a ticket to the current site:
Click the wrench icon next to a ticket with the current status and select .
Confirm your choice when Hawk prompts for a confirmation.
If the ticket cannot be granted for any reason, Hawk shows an error message. After the ticket has been successfully granted, Hawk will update the ticket status in the category.
Procedure 13: Simulating Granting and Revoking Tickets #
Hawk's allows you to explore failure scenarios before they happen. To explore whether your resources that depend on a certain ticket behave as expected, you can also test the impact of granting or revoking tickets.
Start a Web browser and log in to Hawk.
Click the wrench icon next to the user name in the top-level row, and select .
Hawk's background changes color to indicate the simulator is active. A simulator dialog opens in the bottom right hand corner of the screen. Its title indicates that the screen still reflects the current state of the cluster.
To simulate status change of a ticket:
Click in the simulator control dialog.
Select the you want to simulate.
Confirm your changes to add them to the queue of events listed in the controller dialog below .
To start the simulation, click in the simulator control dialog. The screen displays the impact of the simulated events. The simulator control dialog changes to .
To exit the simulation mode, close the simulator control dialog. The screen switches back to its normal color and displays the current cluster state.
Figure 8: HawkSimulator—Tickets #
For more information about Hawk's (and which other scenarios can be explored with it), refer to the Administration Guide for SUSE Linux Enterprise High Availability Extension, available from http://www.suse.com/documentation/. Refer to chapter Configuring and Managing Cluster Resources (Web Interface), section Exploring Potential Failure Scenarios.
12 Troubleshooting #
Booth uses the same logging mechanism as the CRM. Thus, changing the log level will also take effect on booth logging. The booth log messages also contain information about any tickets.
Both the booth log messages and the booth configuration file are included in the
hb_report and crm_report.
In case of unexpected booth behavior or any problems, check the logging data with
sudo journalctl -n or create a detailed cluster report with either
hb_report or crm_report.
In case you can access the cluster nodes on all sites (plus the arbitrators) from one single
host via SSH, it is possible to collect log files from all of them within the same
hb_report. When calling hb_report with the
-n option, it gets the log files from all hosts that you specify with
-n. (Without -n, it would try to obtain the list of nodes from
the respective cluster). For example, to create a single hb_report that
includes the log files from two two-node clusters
(192.168.201.111|192.168.201.112 and
192.168.202.111|192.168.202.112) plus an arbitrator
(147.2.207.14), use the following command:
root # hb_report -n "147.2.207.14 192.168.201.111 192.168.201.112 \
192.168.202.111 192.168.202.112" -f 10:00 -t 11:00 db-incidentIf the issue is about booth only and you know on which cluster nodes (within a site) booth is running, then specify only those two nodes plus the arbitrator.
If there is no way to access all sites from one host, run
hb_report individually on the arbitrator, and on the cluster nodes of the
individual sites, specifying the same
period of time. To collect the log files on an arbitrator, you must use the -S option
for single node operation:
amsterdam # hb_report -f 10:00 -t 11:00 db-incident-amsterdam berlin # hb_report -f 10:00 -t 11:00 db-incident-berlin arbitrator # hb_report -S -f 10:00 -t 11:00 db-incident-arbitrator
However, it is preferable to produce one single hb_report for all
machines that you need log files from.
13 Upgrading to the Latest Product Version #
For general instructions on how to upgrade a cluster, see the Administration Guide for SUSE Linux Enterprise High Availability Extension 12. It is available at http://www.suse.com/documentation/. The chapter Upgrading Your Cluster and Updating Software Packages also describes which preparations to take care of before starting the upgrade process.
13.1 Upgrading from SLE HA 11 SP3 to SLE HA 12 #
The former booth version (v0.1) was based on the Paxos algorithm. The
current booth version (v0.2) is loosely based on raft and is incompatible
with the one running v0.1. Therefore, rolling upgrades are not
possible. Because of the new multi-tenancy feature, the new arbitrator init
script cannot stop or test the status of the Paxos v0.1 arbitrator.
On upgrade to v0.2, the arbitrator will be stopped, if running.
The OCF resource-agent
ocf:pacemaker:booth-site is capable of
stopping and monitoring the booth v0.1 site daemon.
For an upgrade of the cluster nodes from SUSE Linux Enterprise High Availability Extension 11 SP3 to SUSE Linux Enterprise High Availability Extension 12, follow the instructions in the Administration Guide for SUSE Linux Enterprise High Availability Extension 12, section Upgrading from SLE HA 11 SP3 to SLE HA 12.
If you use arbitrators outside of the cluster sites:
Upgrade them from SUSE Linux Enterprise Server 11 SP3 to SUSE Linux Enterprise Server 12, too.
Add the Geo Clustering for SUSE Linux Enterprise High Availability Extension add-on and install the packages as described in Section 1.2, “Installing the Packages on Arbitrators”.
As the syntax and the consensus algorithm for booth has changed, you need to update the booth configuration files to match the latest requirements. Previously you could optionally specify expiry time and weights by appending them to the ticket name with a semicolon (
;) as separator. The new syntax has separate tokens for all ticket options. See Section 6, “Setting Up the Booth Services” for details. If you did not specify expiry time or weights different from the defaults, and do not want to use the multi-tenancy feature, you can still use the old/etc/booth/booth.conf.Synchronize the updated booth configuration files across all cluster sites and arbitrators.
Start the booth service on the cluster sites and the arbitrators as described in Section 6.4, “Enabling and Starting the Booth Services”.









