GEO Clustering for SUSE Linux Enterprise High Availability Extension 12

Quick Start

Author: Tanja Roth
Publication date: 07/02/2017

Apart from local clusters and metro area clusters, SUSE® Linux Enterprise High Availability Extension 12 also supports GEO clusters. That means you can have multiple, geographically dispersed sites with a local cluster each. Failover between these clusters is coordinated by a higher level entity: the booth daemon (boothd). Support for GEO clusters is available as a separate extension to High Availability Extension, called GEO Clustering for SUSE Linux Enterprise High Availability Extension.

1 Installation as Add-on

For using the High Availability Extension and GEO Clustering for SUSE Linux Enterprise High Availability Extension, you need the packages included in the following installation patterns:

  • High Availability

  • GEO Clustering for High Availability

Note
Note: Package Requirements for Arbitrators

If your GEO cluster setup includes one ore more arbitrators (see Arbitrator), those only need the pattern GEO Clustering for High Availability. For instructions on how to install this pattern, see Section 1.2, “Installing the Packages on Arbitrators”.

Both patterns are only available if you have registered your system at SUSE Customer Center (or a local registration server) and have added the respective product channels or installation media as add-ons. For information on how to install add-on products, see the SUSE Linux Enterprise 12 Deployment Guide, available at http://www.suse.com/documentation/. Refer to chapter Installing Add-On Products.

1.1 Installing the Packages on Cluster Nodes

In case both High Availability Extension and GEO Clustering for SUSE Linux Enterprise High Availability Extension have been added as add-on products, but the packages are not installed yet, proceed as follows:

  1. To install the packages from both patterns via command line, use zypper:

    sudo zypper in -t pattern ha_sles ha_geo
  2. Alternatively, use YaST for a graphical installation:

    1. Start YaST as root user and select Software › Software Management.

    2. Click View › Patterns and activate the following patterns:

      • High Availability

      • GEO Clustering for High Availability

    3. Click Accept to start installing the packages.

Important
Important

The software packages needed for High Availability and GEO clusters are not automatically copied to the cluster nodes.

  • Install SUSE Linux Enterprise Server 12 and the High Availability and GEO Clustering for High Availability patterns on all machines that will be part of your GEO cluster.

  • If you do not want to install the packages manually on all nodes that will be part of your cluster, use AutoYaST to clone existing nodes. Find more information in the Administration Guide for SUSE Linux Enterprise High Availability Extension 12, available from http://www.suse.com/documentation/. Refer to chapter Installation and Basic Setup, section Mass Deployment with AutoYaST.

    For all machines that need the GEO Clustering for SUSE Linux Enterprise High Availability Extension add-on, you currently need to install the packages for GEO clusters manually as AutoYaST support for GEO Clustering for SUSE Linux Enterprise High Availability Extension is not yet available.

1.2 Installing the Packages on Arbitrators

  1. Make sure that GEO Clustering for SUSE Linux Enterprise High Availability Extension has been added as add-on product to the machines to serve as arbitrators.

  2. Log in to each arbitrator and install the packages with the following command:

    sudo zypper in -t pattern ha_geo

    Alternatively, use YaST to install the GEO Clustering for High Availability pattern.

2 Challenges for GEO Clusters

Typically, GEO environments are too far apart to support synchronous communication between the sites. That leads to the following challenges:

  • How to make sure that a cluster site is up and running?

  • How to make sure that resources are only started once?

  • How to make sure that quorum can be reached between the different sites and a split brain scenario can be avoided?

  • How to manage failover between the sites?

  • How to deal with high latency in case of resources that need to be stopped?

In the following sections, learn how to meet these challenges with SUSE Linux Enterprise High Availability Extension.

3 Conceptual Overview

GEO clusters based on SUSE Linux Enterprise High Availability Extension can be considered as overlay clusters where each cluster site corresponds to a cluster node in a traditional cluster. The overlay cluster is managed by the booth mechanism. It guarantees that the cluster resources will be highly available across different cluster sites. This is achieved by using cluster objects called tickets that are treated as failover domain between cluster sites, in case a site should be down. Booth guarantees that every ticket is owned by only one site at the time.

The following list explains the individual components and mechanisms that were introduced for GEO clusters in more detail.

Components and Ticket Management
Ticket

A ticket grants the right to run certain resources on a specific cluster site. A ticket can only be owned by one site at a time. Initially, none of the sites has a ticket—each ticket must be granted once by the cluster administrator. After that, tickets are managed by the booth for automatic failover of resources. But administrators may also intervene and grant or revoke tickets manually.

After a ticket is administratively revoked, it is not managed by booth anymore. For booth to start managing the ticket again, the ticket must be again granted to a site.

Resources can be bound to a certain ticket by dependencies. Only if the defined ticket is available at a site, the respective resources are started. Vice versa, if the ticket is removed, the resources depending on that ticket are automatically stopped.

The presence or absence of tickets for a site is stored in the CIB as a cluster status. With regards to a certain ticket, there are only two states for a site: true (the site has the ticket) or false (the site does not have the ticket). The absence of a certain ticket (during the initial state of the GEO cluster) is not treated differently from the situation after the ticket has been revoked: both are reflected by the value false.

A ticket within an overlay cluster is similar to a resource in a traditional cluster. But in contrast to traditional clusters, tickets are the only type of resource in an overlay cluster. They are primitive resources that do not need to be configured nor cloned.

Booth

Booth is the instance managing the ticket distribution and thus, the failover process between the sites of a GEO cluster. Each of the participating clusters and arbitrators runs a service, the boothd. It connects to the booth daemons running at the other sites and exchanges connectivity details. Once a ticket is granted to a site, the booth mechanism can manage the ticket automatically: If the site which holds the ticket is out of service, the booth daemons will vote which of the other sites will get the ticket. To protect against brief connection failures, sites that lose the vote (either explicitly or implicitly by being disconnected from the voting body) need to relinquish the ticket after a time-out. Thus, it is made sure that a ticket will only be re-distributed after it has been relinquished by the previous site. See also Dead Man Dependency (loss-policy="fence").

Arbitrator

Each site runs one booth instance that is responsible for communicating with the other sites. If you have a setup with an even number of sites, you need an additional instance to reach consensus about decisions such as failover of resources across sites. In this case, add one or more arbitrators running at additional sites. Arbitrators are single machines that run a booth instance in a special mode. As all booth instances communicate with each other, arbitrators help to make more reliable decisions about granting or revoking tickets. Arbitrators cannot hold any tickets.

An arbitrator is especially important for a two-site scenario: For example, if site A can no longer communicate with site B, there are two possible causes for that:

  • A network failure between A and B.

  • Site B is down.

However, if site C (the arbitrator) can still communicate with site B, site B must still be up and running.

Ticket Failover

If the ticket gets lost, that means other boot instances do not hear from the ticket owner in a sufficiently long time, one of the remaining sites will acquire the ticket. This is what is called ticket failover. If the remaining members cannot form a majority, then the ticket cannot fail over.

Dead Man Dependency (loss-policy="fence")

After a ticket is revoked, it can take a long time until all resources depending on that ticket are stopped, especially in case of cascaded resources. To cut that process short, the cluster administrator can configure a loss-policy (together with the ticket dependencies) for the case that a ticket gets revoked from a site. If the loss-policy is set to fence, the nodes that are hosting dependent resources are fenced.

Warning
Warning: Potential Loss of Data

On the one hand, loss-policy="fence" considerably speeds up the recovery process of the cluster and makes sure that resources can be migrated more quickly.

On the other hand, it can lead to loss of all unwritten data, such as:

  • Data lying on shared storage (for example, DRBD).

  • Data in a replicating database (for example, MariaDB or PostgreSQL) that has not yet reached the other site, due to a slow network link.

Example Scenario: A Two-Site Cluster (4 Nodes + Arbitrator)
Figure 1: Example Scenario: A Two-Site Cluster (4 Nodes + Arbitrator)

The most common scenario is probably a GEO cluster with two sites and a single arbitrator on a third site. The upper limit is (currently) 16 booth instances.

As usual, the CIB is synchronized within each cluster, but it is not synchronized across cluster sites of a GEO cluster. You have to configure the resources that will be highly available across the GEO cluster for every site accordingly.

4 Requirements

Software Requirements
  • All clusters that will be part of the GEO cluster must be based on SUSE Linux Enterprise High Availability Extension 12.

  • SUSE® Linux Enterprise Server 12 must be installed on all arbitrators.

  • The GEO Clustering for SUSE Linux Enterprise High Availability Extension add-on must be installed on all cluster nodes and on all arbitrators that will be part of the GEO cluster.

Network Requirements
  • The sites must be reachable on one UDP and TCP port per booth instance. That means any firewalls or IPSec tunnels in between must be configured accordingly.

  • Other setup decision may require to allow more open ports (for example, for DRBD or database replication).

Other Requirements and Recommendations
  • All cluster nodes on all sites should synchronize to an NTP server outside the cluster. For more information, see the Administration Guide for SUSE Linux Enterprise Server 12, available at http://www.suse.com/documentation/. Refer to the chapter Time Synchronization with NTP.

    If nodes are not synchronized, log files and cluster reports are very hard to analyze.

5 Basic Setup— Overview

Configuring a GEO cluster takes the following basic steps:

Setting Up the Booth Services
Configuring Cluster Resources and Constraints

Use either crmsh or Hawk for the following steps:

  1. Configuring Ticket Dependencies

  2. Configuring a Resource Group for boothd

  3. Adding an Ordering Constraint for boothd and the Resource Group

6 Setting Up the Booth Services

The default booth configuration is /etc/booth/booth.conf. This file must be the same on all sites of your GEO cluster, including the arbitrator or arbitrators. To keep the booth configuration synchronous across all sites and arbitrators, use Csync2, as described in Section 6.3, “Synchronizing the Booth Configuration Across All Sites and Arbitrators”.

For setups including multiple GEO clusters, it is possible to share the same arbitrator (as of SUSE Linux Enterprise High Availability Extension 12). By providing several booth configuration files, you can start multiple booth instances on the same arbitrator, with each booth instance running on a different port. That way, you can use one machine to serve as arbitrator for different GEO clusters. For details on how to configure booth for multiple GEO clusters, refer to Section 6.2, “Booth Setup for Multiple Tenants”.

6.1 Default Booth Setup

To configure all parameters needed for booth, either edit the booth configuration files manually or by using the YaST Geo Cluster module. To access the YaST module, start it from command line with yast2 geo-cluster (or start YaST and select High Availability › Geo Cluster).

Example 1: A Booth Configuration File
transport = UDP 1
port = 9929 2
arbitrator = 147.2.207.14 3
site= 147.4.215.19 4
site= 147.18.2.1  4
ticket="ticketA" 5
     expire = 600 6
     timeout = 10 7
     retries = 5 8
     renewal-freq = 30 9
     before-acquire-handler10 = /usr/share/booth/service-runnable11 db-112
     acquire-after = 60 13
ticket="ticketB" 5
     expire = 600 6
     timeout = 10 7
     retries = 5 8
     renewal-freq = 30 9
     before-acquire-handler10 = /usr/share/booth/service-runnable11 db-8 12
     acquire-after = 60 13
    

1

The transport protocol used for communication between the sites. Only UDP is supported, but other transport layers will follow in the future. Currently, this parameter can therefore be omitted.

2

The port to be used for communication between the booth instances at each site. When not using the default port (9929), choose a port that is not already used for different services. Make sure to open the port in the nodes' and arbitrators' firewalls. The booth clients use TCP to communicate with the boothd. Booth will always bind and listen to both UDP and TCP ports.

3

The IP address of the machine to use as arbitrator. Add an entry for each arbitrator you use in your GEO cluster setup.

4

The IP address used for the boothd on a site. Add an entry for each site you use in your GEO cluster setup. Make sure to insert the correct virtual IP addresses (IPaddr2) for each site, otherwise the booth mechanism will not work correctly. Booth works with both IPv4 and IPv6 addresses.

5

The ticket to be managed by booth. For each ticket, add a ticket entry.

6

Optional parameter. Defines the ticket's expiry time in seconds. A site that has been granted a ticket will renew the ticket regularly. If booth does not receive any information about renewal of the ticket within the defined expiry time, the ticket will be revoked and granted to another site. If no expiry time is specified, the ticket will expire after 600 seconds by default. The parameter should not be set to a value less than 120 seconds.

7

Optional parameter. Defines a timeout period in seconds. After that time, booth will resend packets if it did not receive a reply within this period. The timeout defined should be long enough to allow packets to reach other booth members (all arbitrators and sites).

8

Optional parameter. Defines how many times booth retries sending packets before giving up waiting for confirmation by other sites. Values smaller than 3 are invalid and will prevent booth from starting.

9

Optional parameter. Sets the ticket renewal frequency period. Ticket renewal occurs every half expiry time by default. If the network reliability is often reduced over prolonged periods, it is advisable to renew more often. Before every renewal the before-acquire-handler is run.

10

Optional parameter. If set, the specified command will be called before boothd tries to acquire or renew a ticket. On exit code other than 0, boothd relinquishes the ticket.

11

The service-runnable script referenced here is included in the product as an example. It is a simple script based on crm_simulate. It can be used to test if a particular cluster resource can be run on the current cluster site (if the cluster is healthy enough to run the resource, if all resource dependencies are fulfilled etc.). For instance, if a service in the dependency-chain has a failcount of INFINITY on all available nodes, the service cannot be run on that site. In that case, it is of no use to claim the ticket.

12

The resource to be tested by the before-acquire-handler (in this case, by the service-runnable script). You need to reference the resource which is protected by the respective ticket. In this example, resource db-1 is protected by ticketA whereas db-8 is protected by ticketB.

13

Optional parameter. After a ticket is lost, booth will wait this time in addition before acquiring the ticket. This is to allow for the site that lost the ticket to relinquish the resources, by either stopping them or fencing a node. A typical delay might be 60 seconds, but ultimately it depends on the protected resources and the fencing configuration. The default value is 0.

If you are unsure how long stopping or demoting the resources or fencing a node may take (depending on the loss-policy), use this parameter to prevent resources from running on two sites at the same time.

Procedure 1: Manually Editing The Booth Configuration File
  1. Log in to a cluster node as root or equivalent.

  2. Copy the example booth configuration file /etc/booth/booth.conf.example to /etc/booth/booth.conf.

  3. Edit /etc/booth/booth.conf according to Example 1, “A Booth Configuration File”.

  4. Verify your changes and save the file.

  5. On all cluster nodes and arbitrators, open the port in the firewall that you have configured for booth. See Example 1, “A Booth Configuration File”, position 2.

Procedure 2: Setting Up Booth with YaST
  1. Log in to a cluster node as root or equivalent.

  2. Start the YaST Geo Cluster module.

  3. Choose to Edit an existing booth configuration file or click Add to create a new booth configuration file:

    1. In the screen that appears configure the following parameters:

      • Configuration File. A name for the booth configuration file. YaST suggests booth by default. This results in the booth configuration being written to /etc/booth/booth.conf. Only change this value if you need to set up multiple booth instances for different GEO clusters as described in Section 6.2, “Booth Setup for Multiple Tenants”.

      • Transport. The transport protocol used for communication between the sites. Only UDP is supported, but other transport layers will follow in the future. See also Example 1, “A Booth Configuration File”, position 1.

      • Port. The port to be used for communication between the booth instances at each site. See also Example 1, “A Booth Configuration File”, position 2.

      • Arbitrator.  The IP address of the machine to use as arbitrator. See also Example 1, “A Booth Configuration File”, position 3.

        To specify an Arbitrator, click Add. In the dialog that opens, enter the IP address of your arbitrator and click OK.

      • Site. The IP address used for the boothd on a site. See also Example 1, “A Booth Configuration File”, position 4.

        To specify a Site of your GEO cluster, click Add. In the dialog that opens, enter the IP address of one site and click OK.

      • Ticket. The ticket to be managed by booth. See also Example 1, “A Booth Configuration File”, position 5.

        To specify a Ticket, click Add. In the dialog that opens, enter a unique Ticket name. If you need to define multiple tickets with the same parameters and values, you can save configuration effort by creating a ticket template which specifies the defaults parameters and values for all tickets. To do so, enter __default__ as Ticket name.

        Additionally, you can specify optional parameters for your ticket. For an overview, see Example 1, “A Booth Configuration File”, positions 6 to 13.

        Click OK to confirm your changes.

      Example Ticket Dependency
      Figure 2: Example Ticket Dependency
    2. Click OK to close the current booth configuration screen. YaST shows the name of the booth configuration file that you just defined.

  4. Before closing the YaST module, switch to the Firewall Configuration category.

  5. To open the port you have configured for booth, enable Open Port in Firewall.

    Important
    Important: Firewall Setting for Local Machine Only

    The firewall setting is only applied to the current machine. It will open the UDP/TCP ports for all ports that have been specified in /etc/booth/booth.conf or any other booth configuration files (see Section 6.2, “Booth Setup for Multiple Tenants”).

    Make sure to open the respective ports on all other cluster nodes and arbitrators of your GEO cluster setup, too. Do so either manually or by synchronizing the following files with Csync2:

    • /etc/sysconfig/SuSEfirewall2

    • /etc/sysconfig/SuSEfirewall2.d/services/booth

  6. Click Finish to confirm all settings and close the YaST module. Depending on the NAME of the Configuration File specified in Step 3.a, the configuration is written to /etc/booth/NAME.conf.

6.2 Booth Setup for Multiple Tenants

For setups including multiple GEO clusters, it is possible to share the same arbitrator (as of SUSE Linux Enterprise High Availability Extension 12). By providing several booth configuration files, you can start multiple booth instances on the same arbitrator, with each booth instance running on a different port. That way, you can use one machine to serve as arbitrator for different GEO clusters.

Let us assume you have two GEO clusters, one in EMEA (Europe, the Middle East and Africa), and one in the Asia-Pacific region (APAC).

To use the same arbitrator for both GEO clusters, create two configuration files in the /etc/booth directory: /etc/booth/emea.conf and /etc/booth/apac.conf. Both must minimally differ in the following parameters:

  • The port used for the communication of the booth instances.

  • The sites belonging to the different GEO clusters that the arbitrator is used for.

Example 2: /etc/booth/apac.conf


port = 9133 2
arbitrator = 147.2.207.14 3
site= 192.168.2.254 4
site= 192.168.1.112 4
ticket="tkt-db-apac-intern" 5
     timeout = 10 
     retries = 5 
     renewal-freq = 60 
     before-acquire-handler10 = /usr/share/booth/service-runnable11 db-apac-intern 12 
ticket="tkt-db-apac-cust" 5
     timeout = 10 
     retries = 5 
     renewal-freq = 60 
     before-acquire-handler10 = /usr/share/booth/service-runnable11 db-apac-cust 12
Example 3: /etc/booth/emea.conf


port = 9150 2
arbitrator = 147.2.207.14 3
site= 192.168.4.113 4
site=  192.168.6.1134
ticket="tkt-sap-crm" 5
     expire = 900 
     renewal-freq = 60 
     before-acquire-handler10 = /usr/share/booth/service-runnable11 sap-crm 12
ticket="tkt-sap-prod" 5
     expire = 600 
     renewal-freq = 60 
     before-acquire-handler10 = /usr/share/booth/service-runnable11 sap-prod 12

2

The port to be used for communication between the booth instances at each site. The configuration files use different ports to allow for start of multiple booth instances on the same arbitrator.

3

The IP address of the machine to use as arbitrator. In the examples above, we use the same arbitrator for different GEO clusters.

4

The IP address used for the boothd on a site. The sites defined in both booth configuration files are different, because they belong to two different GEO clusters.

5

The ticket to be managed by booth. Theoretically the same ticket names can be defined in different booth configuration files— the tickets will not interfere because they are part of different GEO clusters that are managed by different booth instances. However, (for better overview), we advise to use distinct ticket names for each GEO cluster as shown in the examples above.

Procedure 3: Using the Same Arbitrator for Different GEO Clusters
  1. Create different booth configuration files in /etc/booth as shown in Example 2, “ /etc/booth/apac.conf and Example 3, “ /etc/booth/emea.conf. Do so either manually or with YaST, as outlined in Procedure 2, “Setting Up Booth with YaST”.

  2. On the arbitrator, open the ports that are defined in any of the booth configuration files in /etc/booth.

  3. On the nodes belonging to the individual GEO clusters that the arbitrator is used for, open the port that is used for the respective booth instance.

  4. Synchronize the respective booth configuration files across all cluster nodes and arbitrators that use the same booth configuration. For details, see Section 6.3, “Synchronizing the Booth Configuration Across All Sites and Arbitrators”.

  5. On the arbitrator, start the individual booth instances as described in Starting the Booth Services on Arbitrators for multi-tenancy setups.

  6. On the individual GEO clusters, start the booth service as described in Starting the Booth Services on Cluster Sites.

6.3 Synchronizing the Booth Configuration Across All Sites and Arbitrators

To make booth work correctly, all cluster nodes and arbitrators within one GEO cluster must use the same booth configuration. In case of any booth configuration changes, make sure to update the configuration files accordingly on all parties and to restart the booth services as described in Section 6.5, “Reconfiguring Booth While Running”.

Note
Note: Synchronize Booth Configuration to All Sites and Arbitrators

All cluster nodes and arbitrators within the GEO cluster must use the same booth configuration. While you need to copy the configuration files manually to the arbitrators, you can use Csync2 to synchronize the booth configuration across the cluster nodes on all sites as described in Section 6.3.1, “Csync2 Setup for GEO Clusters” and Section 6.3.2, “Synchronizing Changes with Csync2”.

6.3.1 Csync2 Setup for GEO Clusters

A synchronization tool that can be used to replicate configuration files across all nodes in the cluster, and even across GEO clusters. Csync2 can handle any number of hosts, sorted into synchronization groups. Each synchronization group has its own list of member hosts and its include/exclude patterns that define which files should be synchronized in the synchronization group. The groups, the hostnames belonging to each group, and the include/exclude rules for each group are specified in the Csync2 configuration file, /etc/csync2/csync2.cfg.

For authentication, Csync2 uses the IP addresses and pre-shared keys within a synchronization group. You need to generate one key file for each synchronization group and copy it to all group members.

For detailed information about Csync2, refer to http://oss.linbit.com/csync2/paper.pdf

Csync2 will contact other servers via a TCP port (per default 6556) , and uses xinetd to start remote Csync2 instances.

How to set up Csync2 for individual clusters with YaST is explained in the Administration Guide for SUSE Linux Enterprise High Availability Extension, chapter Installation and Basic Setup, section Transferring the Configuration to All Nodes. However, YaST cannot handle more complex Csync2 setups, like those that are needed for GEO clusters. For the following setup, configure Csync2 manually by editing the configuration files.

To adjust Csync2 for synchronizing files not only within local clusters but also across geographically dispersed sites, you need define two synchronization groups in the Csync2 configuration:

  • A global group ha_global (for the files that need to be synchronized globally, across all sites and arbitrators belonging to a GEO cluster).

  • A group for the local cluster site ha_local (for the files that need to be synchronized within the local cluster).

For an overview of the multiple Csync2 configuration files for the two synchronization groups, see Figure 3, “Example Setup of Csync2 for GEO Clusters”.

Example Setup of Csync2 for GEO Clusters
Figure 3: Example Setup of Csync2 for GEO Clusters

Authentication key files and their references are displayed in red. The names of Csync2 configuration files are displayed in blue, their references are displayed in green.

/etc/csync2/csync2.cfg

The main Csync2 configuration file. It is kept short and simple on purpose and only contains the following:

  • The definition of the synchronization group ha_local. The group consists of two nodes (this-site-host-1 and this-site-host-2) and uses /etc/csync2/ha_local.key for authentication. A list of files to be synchronized for this group only is defined in another Csync2 configuration file, /etc/csync2/ha_local.cfg. It is included with the config statement.

  • A reference to another Csync2 configuration file, /etc/csync2.cfg/ha_global.cfg, included with the config statement.

/etc/csync2/ha_local.cfg

This file concerns only the local cluster. It specifies a list of files to be synchronized only within the ha_local synchronization group, as this files are specific per cluster. The most important ones are the following:

  • /etc/csync2/csync2.cfg, as this file contains the list of the local cluster nodes.

  • /etc/csync2/ha_local.key, the authentication key to be used for Csync2 synchronization within the local cluster.

  • /etc/corosync/corosync.conf, as this file defines the communication channels between the local cluster nodes.

  • /etc/corosync/authkey, the Corosync authentication key.

The rest of the file list depends on your specific cluster setup. The files listed in Figure 3, “Example Setup of Csync2 for GEO Clusters”are only examples.

/etc/csync2.cfg/ha_global.cfg

This files defines the Csync2 synchronization group ha_global. The group spans all cluster nodes across multiple sites. As it is recommended to use a separate key for each Csync2 synchronization group, this group uses /etc/csync2/ha_global.key for authentication. The include statements define the list of files to be synchronized within the ha_global synchronization group. The most important ones are the following:

  • /etc/csync2/ha_global.cfg and /etc/csync2/ha_global.key (the configuration file for the ha_globalsynchronization group and the authentication key used for synchronization within the group)

  • /etc/booth/booth.conf, the default booth configuration file. In case you are using a booth setup for multiple tenants, replace this file with the different booth configuration files that you have created. See Section 6.2, “Booth Setup for Multiple Tenants”for details.

  • /etc/drbd.conf and /etc/drbd.d (if you are using DRBD within your cluster setup). The DRBD configuration can be globally synchronized, as it derives the configuration by the host names contained in the resource configuration file.

  • /etc/zypp/repos.de. The package repositories are likely to be the same on all cluster nodes.

The other files shown (/etc/root/*) are examples that may be included for convenience reasons (to make a cluster administrator's life easier).

6.3.2 Synchronizing Changes with Csync2

To successfully synchronize the files with Csync2, the following prerequisites must be met:

  • The same Csync2 configuration is available on all nodes.

  • The Csync2 authentication key for each synchronization group must be available on all members of that group.

  • Both Csync2 and xinetd must be running on all nodes.

Before the first Csync2 run, you therefore need to make the following preparations:

  1. Copy the Csync2 configuration files manually to the respective nodes.

  2. Generate one authentication key for each synchronization group, using the following command:

    csync2 -k NAME_OF_KEYFILE
  3. Copy the resulting key files to all members of the respective synchronization group. However, do not regenerate the key files on any other members of the same group.

  4. Execute the following commands on all nodes to make both xinetd and csync2 services start automatically at boot time:

    root # systemctl enable csync2.socket
            root # systemctl enable xinetd.service
  5. Execute the following commands and to start both services now:

    root # systemctl start csync2.socket
           root # systemctl start xinetd.service
Procedure 4: Synchronizing Files with Csync2
  1. To initially synchronize all files once, execute the following command on the node that you want to copy the configuration from:

    root # csync2 -xv

    This will synchronize all the files once by pushing them to the other members of the synchronization groups. If all files are synchronized successfully, Csync2 will finish with no errors.

    If one or several files that are to be synchronized have been modified on other nodes (not only on the current one), Csync2 will report a conflict. You will get an output similar to the one below:

    While syncing file /etc/corosync/corosync.conf:
          ERROR from peer site-3-host-1: File is also marked dirty here!
          Finished with 1 errors.
  2. If you are sure that the file version on the current node is the best one, you can resolve the conflict by forcing this file and resynchronizing:

    root # csync2 -f /etc/corosync/corosync.conf
          csync2 -x

For more information on the Csync2 options, run csync2 -help.

Note
Note: Pushing Synchronization After Any Changes

Csync2 only pushes changes. It does not continuously synchronize files between the nodes.

Each time you update files that need to be synchronized, you have to push the changes to the other nodes: Run csync2 -xv on the node where you did the changes. If you run the command on any of the other nodes with unchanged files, nothing will happen.

6.4 Enabling and Starting the Booth Services

Starting the Booth Services on Cluster Sites

The booth service for each cluster site is managed by the booth resource group configured in Procedure 6, “Configuring a Resource Group for boothd. To start one instance of the booth service per site, start the respective booth resource group on each cluster site.

Starting the Booth Services on Arbitrators

Starting with SUSE Linux Enterprise 12, booth arbitrators are managed with systemd. The unit file is named booth@.service. The @ denotes the possibility to run the service with a parameter, which is in this case the name of the configuration file.

To enable the booth service on an arbitrator, use the following command:

root # systemctl enable booth@booth

After the service has been enabled from command line, YaST System Services (Runlevel) can then be used to manage the service, as long as it is not disabled. In that case, it will disappear from the service list in YaST next time systemd is restarted.

However, the command to start the booth service depends on your booth setup:

  • If you are using the default setup as described in Section 6.1, “Default Booth Setup”, only /etc/booth/booth.conf is configured. In that case, log in to each arbitrator and use the following command:

    root # systemctl start booth@booth
  • If you are running booth in multi-tenancy mode as described in Section 6.2, “Booth Setup for Multiple Tenants”, you have configured multiple booth configuration files in /etc/booth. To start the services for the individual booth instances, use systemctl start booth@NAME, where NAME stands for the name of the respective configuration file /etc/booth/NAME.conf.

    For example, if you have the booth configuration files /etc/booth/emea.conf and /etc/booth/apac.conf, log in to your arbitrator and execute the following commands:

    root # systemctl start booth@emea
    root # systemctl start booth@apac

This starts the booth service in arbitrator mode. It can communicate with all other booth daemons but in contrast to the booth daemons running on the cluster sites, it cannot be granted a ticket. Booth arbitrators take part in elections only. Otherwise, they are dormant.

6.5 Reconfiguring Booth While Running

In case you need to change the booth configuration while the booth services are already running, proceed as follows:

  1. Adjust the booth configuration files as desired.

  2. Synchronize the updated booth configuration files to all cluster nodes and arbitrators that are part of your GEO cluster. For details, see Section 6.3, “Synchronizing the Booth Configuration Across All Sites and Arbitrators”.

  3. Restart the booth services on the arbitrators and cluster sites as described in Section 6.4, “Enabling and Starting the Booth Services”. This does not have any effect on tickets that have already been granted to sites.

7 Configuring Cluster Resources and Constraints

Apart from the resources and constraints that you need to define for your specific cluster setup, GEO clusters require additional resources and constraints as described below. You can either configure them with the crm shell (crmsh), or with the HA Web Konsole (Hawk).

7.1 From Command Line

This section focuses on tasks specific to GEO clusters. For a an introduction to the crm shell and general instructions on how to configure resources and constraints with crmsh, refer to the Administration Guide for SUSE Linux Enterprise High Availability Extension, chapter Configuring and Managing Cluster Resources (Command Line).

Procedure 5: Configuring Ticket Dependencies

For GEO clusters, you can specify which resources depend on a certain ticket. Together with this special type of constraint, you can set a loss-policy that defines what should happen to the respective resources if the ticket is revoked. The attribute loss-policy can have the following values:

  • fence: Fence the nodes that are running the relevant resources.

  • stop: Stop the relevant resources.

  • freeze: Do nothing to the relevant resources.

  • demote: Demote relevant resources that are running in master mode to slave mode.

  1. On one of the cluster nodes, start a shell and log in as root or equivalent.

  2. Enter crm configure to switch to the interactive crm shell.

  3. Configure a constraint that defines which resources depend on a certain ticket. For example:

    crm(live)configure# rsc_ticket rsc1-req-ticketA ticketA: rsc1 \
      loss-policy="fence"

    This creates a constraint with the ID rsc1-req-ticketA. It defines that the resource rsc1 depends on ticketA and that the node running the resource should be fenced in case ticketA is revoked.

    Alternatively, you can configure resource rsc1 not as a primitive, but a multi-state resource that can run in master or slave mode. In that case, make only rsc1's master mode depend on ticketA. With the following configuration, rsc1 is automatically demoted to slave mode if ticketA is revoked:

    crm(live)configure# rsc_ticket rsc1-req-ticketA ticketA: rsc1:Master \
      loss-policy="demote"
  4. If you want other resources to depend on further tickets, create as many constraints as necessary with rsc_ticket.

  5. Review your changes with show.

  6. If everything is correct, submit your changes with commit and leave the crm live configuration with exit.

    The constraints are saved to the CIB.

Procedure 6: Configuring a Resource Group for boothd

Each site needs to run one instance of boothd that communicates with the other booth daemons. The daemon can be started on any node, therefore it should be configured as primitive resource. To make the boothd resource stay on the same node, if possible, add resource stickiness to the configuration. As each daemon needs a persistent IP address, configure another primitive with a virtual IP address. Group booth primitives:

  1. On one of the cluster nodes, start a shell and log in as root or equivalent.

  2. Enter crm configure to switch to the interactive crm shell.

  3. Enter the following to create both primitive resources and to add them to one group, g-booth:

    crm(live)configure# primitive booth-ip ocf:heartbeat:IPaddr2 \
      params ip="IP_ADDRESS"
    crm(live)configure# primitive booth ocf:pacemaker:booth-site \
      meta resource-stickiness="INFINITY" \
      op monitor interval="10s"
      group g-booth booth-ip booth
  4. Review your changes with show.

  5. If everything is correct, submit your changes with commit and leave the crm live configuration with exit.

  6. Repeat the resource group configuration on the other cluster sites, using a different IP address for each boothd resource group.

    With this configuration, each booth daemon will be available at its individual IP address, independent of the node the daemon is running on.

Procedure 7: Adding an Ordering Constraint

If a ticket has been granted to a site but all nodes of that site should fail to host the boothd resource group for any reason, a split-brain situation among the geographically dispersed sites could occur. In that case, no boothd instance would be available to safely manage fail-over of the ticket to another site. To avoid a potential concurrency violation of the ticket (the ticket is granted to multiple sites simultaneously), add an ordering constraint:

  1. On one of the cluster nodes, start a shell and log in as root or equivalent.

  2. Enter crm configure to switch to the interactive crm shell.

  3. Create an ordering constraint:

    crm(live)configure# order order-booth-rsc1 inf: g-booth rsc1

    This defines that rsc1 (that depends on ticketA) can only be started after the g-booth resource group.

    In case rsc1 is not a primitive, but a special clone resource and configured as described in Step 3 of Procedure 5, “Configuring Ticket Dependencies”, the ordering constraint should be configured as follows:

    crm(live)configure# order order-booth-rsc1 inf: g-booth rsc1:promote

    This defines that rsc1 can only be promoted to master mode after the g-booth resource group has started.

  4. Review your changes with show.

  5. For any other resources that depend on a certain ticket, define further ordering constraints.

  6. If everything is correct, submit your changes with commit and leave the crm live configuration with exit.

7.2 With the HA Web Konsole (Hawk)

This section focuses on tasks specific to GEO clusters. For an introduction to Hawk and general instructions on how to configure resources and constraints with Hawk, refer to the Administration Guide for SUSE Linux Enterprise High Availability Extension, chapter Configuring and Managing Cluster Resources (Web Interface).

Procedure 8: Configuring Ticket Dependencies

For GEO clusters, you can specify which resources depend on a certain ticket. Together with this special type of constraint, you can set a loss-policy that defines what should happen to the respective resources if the ticket is revoked. The attribute loss-policy can have the following values:

  • fence: Fence the nodes that are running the relevant resources.

  • stop: Stop the relevant resources.

  • freeze: Do nothing to the relevant resources.

  • demote: Demote relevant resources that are running in master mode to slave mode.

The following example shows two alternatives to configure the constraint: One with the resource being a primitive and loss-policy="fence", the other one with the resource being a multi-state resource that can run in master or slave mode and with loss-policy="demote".

  1. Start a Web browser and log in to Hawk.

  2. In the left navigation bar, select Constraints. The Constraints screen shows categories for all types of constraints and lists all defined constraints.

  3. To add a new ticket dependency, click the plus icon in the Ticket category.

    To modify an existing constraint, click the wrench icon next to the constraint and select Edit Constraint.

  4. Enter a unique Constraint ID. When modifying existing constraints, the ID is already defined.

  5. Set a Loss Policy.

  6. Enter the ID of the ticket that the resources should depend on.

  7. Select a resource from the list Add resource to constraint. The list shows the IDs of all resources and all resource templates configured for the cluster.

  8. To add the selected resource, click the plus icon next to the list. A new list appears beneath, showing the remaining resources. Add as many resources to the constraint as you would like to depend on the ticket.

    Hawk—Ticket Dependency with loss-policy="fence"
    Figure 4: Hawk—Ticket Dependency with loss-policy="fence"

    Figure 4, “Hawk—Ticket Dependency with loss-policy="fence" shows a constraint with the ID rsc1-req-ticketA. It defines that the resource rsc1 depends on ticketA and that the node running the resource should be fenced in case ticketA is revoked.

    If resource rsc1 was not a primitive, but a multi-state resource, define that only rsc1's master mode depends on ticketA. With the configuration shown in Figure 5, “Hawk—Ticket Dependency with loss-policy="demote", rsc1 is automatically demoted to slave mode if ticketA is revoked:

    Hawk—Ticket Dependency with loss-policy="demote"
    Figure 5: Hawk—Ticket Dependency with loss-policy="demote"
  9. Click Create Constraint to finish the configuration. A message at the top of the screen shows if the constraint was successfully created.

Procedure 9: Configuring a Resource Group for boothd

Each site needs to run one instance of boothd that communicates with the other booth daemons. The daemon can be started on any node, therefore it should be configured as primitive resource. To make the boothd resource stay on the same node, if possible, add resource stickiness to the configuration. As each daemon needs a persistent IP address, configure another primitive with a virtual IP address. Group booth primitives:

  1. Start a Web browser and log in to Hawk.

  2. In the left navigation bar, select Resources. The Resources screen shows categories for all types of resources. It lists any resources that are already defined.

  3. Select the Primitive category and click the plus icon.

  4. To specify the resource for boothd:

    1. Enter a unique Resource ID, for example: booth-ip.

    2. Set Class to ocf, Provider to heartbeat and Type to IPaddr2.

      Hawk automatically shows any required parameters for the resource plus an empty drop-down list that you can use to specify additional parameters.

    3. Define the following Parameters (instance attributes) for the resource and enter values for them:

      • ip

      • cidr_netmask

    4. Click Create Resource to finish the configuration. A message at the top of the screen shows if the resource was successfully created or not.

  5. Click Back to return to the list of configured resources.

  6. Select the Primitive category and click the plus icon.

  7. To specify the resource for boothd:

    1. Enter a unique Resource ID, for example: booth.

    2. Set Class to ocf, Provider to pacemaker and Type to booth-site.

      Hawk automatically shows any required parameters for the resource plus an empty drop-down list that you can use to specify additional parameters.

    3. In the Operations category, select monitor. Hawk proposes a timeout value of 20 and an interval of 10 seconds. Keep the proposed values and add this monitoring operation by clicking the plus icon next to it.

    4. In the Meta-Attributes category, select resource-stickiness and enter INFINITY as value. Click the plus icons next to the value to add this meta attribute.

    5. Click Create Resource to finish the configuration. A message at the top of the screen shows if the resource was successfully created or not.

  8. Click Back to return to the list of configured resources.

  9. To create the group and add booth primitives to it:

    1. Select the Group category and click the plus icon.

    2. Enter a unique Group ID, for example: g-booth.

    3. To define the group members, select booth-ip and booth in the list of Available Primitives and click the < icon to add them to the Group Children list. To define the order of the group members, you currently need to add and remove them in the order you desire.

    4. Hawk automatically proposes the meta attribute target-role. Set its value to Started.

      Hawk—Resource Group for boothd
      Figure 6: Hawk—Resource Group for boothd
    5. Click Create Group to finish the configuration. A message at the top of the screen shows if the group was successfully created.

  10. Repeat the resource group configuration on the other cluster sites, using a different IP address for each boothd resource group.

    With this configuration, each booth daemon will be available at its individual IP address, independent of the node the daemon is running on.

Procedure 10: Adding an Ordering Constraint

If a ticket has been granted to a site but all nodes of that site should fail to host the boothd resource group for any reason, a split-brain situation among the geographically dispersed sites could occur. In that case, no boothd instance would be available to safely manage fail-over of the ticket to another site. To avoid a potential concurrency violation of the ticket (the ticket is granted to multiple sites simultaneously), add an ordering constraint:

  1. Start a Web browser and log in to Hawk.

  2. In the left navigation bar, select Constraints. The Constraints screen shows categories for all types of constraints and lists all defined constraints.

  3. Select the Order category and click the plus icon to create a new ordering constraint.

  4. Enter a unique Constraint ID, for example order-booth-rsc1.

  5. Set the Score to INFINITY.

    For colocation constraints, the score determines the location relationship between the resources. Setting the score to INFINITY forces the resources to run on the same node. For order constraints, the constraint is mandatory if the score is greater than zero, otherwise it is only a suggestion. The default value is INFINITY.

  6. Keep the option Symmetrical enabled. This specifies that resources are stopped in reverse order.

  7. To define the resources for the constraint:

    1. Select the resource group g-booth from the list Add resource to constraint and click the plus icon next to the list to add the resource to the ordering constraint.

      Hawk—Ordering Constraint with Multi-state Resource
      Figure 7: Hawk—Ordering Constraint with Multi-state Resource
    2. Select the resource rsc1 from the list Add resource to constraint and click the plus icon next to the list to add the resource to the ordering constraint.

      Now you have both resources in a dependency chain. The topmost (g-booth) will start first, then the next one (rsc1). Usually the resources will be stopped in reverse order.

    3. In case rsc1 is not a primitive, but a multi-state resource and configured as described in Step 8 of Procedure 8, “Configuring Ticket Dependencies”, select the following entry from the empty drop-down box next to rsc1: promote. This defines that rsc1 can only be promoted to master mode after the g-booth resource group has started.

    4. Click Create Constraint.

      A message at the top of the screen shows if the constraint was successfully created.

  8. Click Back to return to the list of constraints.

  9. For any other resources that depend on a certain ticket, define further ordering constraints.

8 Managing GEO Clusters

Before booth can manage a certain ticket within the GEO cluster, you initially need to grant it to a site manually.

8.1 From Command Line

Use the booth client command line tool to grant, list, or revoke tickets as described in Overview of booth client Commands. The booth client commands can be run on any machine in the cluster, not only the ones having the boothd running. The booth client commands try to find the local cluster by looking at the booth configuration file and the locally defined IP addresses. If you do not specify a site which the booth client should connect to (using the -s option), it will always connect to the local site.

Note
Note: Syntax Changes

The syntax of booth clients commands has been simplified since SUSE Linux Enterprise High Availability Extension 11: For example, the client keyword can be omitted for list, grant, or revoke operations: booth list. Also, the -t option can be omitted when specifying a ticket.

The former syntax is still supported. For detailed information, see the Synopsis section in the booth man page. However, the examples in this manual use the simplified syntax.

Overview of booth client Commands
Listing All Tickets
root #  booth list
ticket: ticketA, leader: none
ticket: ticketB, leader: 10.2.12.101, expires: 2014-08-13 10:28:57
      

If you do not specify a certain site with -s, the information about the tickets will be requested from the local booth instance.

Granting a Ticket to a Site
root #  booth grant -s 147.2.207.14 ticketA
booth[27891]: 2014/08/13_10:21:23 info: grant request sent, waiting for the result ...
booth[27891]: 2014/08/13_10:21:23 info: grant succeeded!

In this case, ticketA will be granted to the site 147.2.207.14. If you omit the -s option, booth will automatically connect to the current site (the site you are running the booth client on) and will request the grant operation.

Before granting a ticket, the command will execute a sanity check. If the same ticket is already granted to another site, you will be warned about that and be prompted to revoke the ticket from the current site first.

Revoking a Ticket From a Site
root #  booth revoke ticketA
booth[27900]: 2014/08/13_10:21:23 info: revoke succeeded!

Booth will check to which site the ticket is currently granted and will request the revoke operation for ticketA. The revoke operation will be executed immediately.

The grant and, under certain circumstances, revoke operations may take a while to return a definite operation's outcome. The client will wait for the result up to the ticket's timeout value before it gives up waiting—unless the -w option was used, in which case the client waits indefinitely. Find the exact status in the log files or with the crm_ticket -L command.

Warning
Warning: crm_ticket and crm site ticket

In case the booth service is not running for any reasons, you may also manage tickets manually with crm_ticket or crm site ticket. Both commands are only available on cluster nodes. In case of intervention, use them with great care as they cannot verify if the same ticket is already granted elsewhere. For more information, read the man pages.

As long as booth is up and running, only use booth client for manual intervention.

After you have initially granted a ticket to a site, the booth mechanism will take over and manage the ticket automatically. If the site holding a ticket should be out of service, the ticket will automatically be revoked after the expiry time and granted to another site. The resources that depend on that ticket will fail over to the new site holding the ticket. The nodes that have run the resources before will be treated according to the loss-policy you set within the constraint.

Procedure 11: Managing Tickets Manually

Assuming that you want to manually move ticketA from site 147.2.207.14 to 192.168.1.110, proceed as follows:

  1. Set ticketA to standby with the following command:

    root # crm_ticket -t ticketA -s
  2. Wait for any resources that depend on ticketA to be stopped or demoted cleanly.

  3. Revoke ticketA from its current site with:

    root # booth revoke -s 147.2.207.14 ticketA
  4. After the ticket has been revoked from its original site, grant it to the new site with:

    booth grant -s 192.168.1.110 ticketA

8.2 With the HA Web Konsole (Hawk)

You can use Hawk as a single point of administration for monitoring multiple clusters. Hawk's Cluster Dashboard allows you to view a summary of multiple clusters, with each summary listing the number of nodes, resources, tickets, and their state. The summary also shows if any failures have appeared in the respective cluster.

To manage cluster site tickets and to test the impact of ticket failover with the Simulator, you can easily switch from the Cluster Dashboard to the other Hawk functions that are available after logging in to an individual cluster. Hawk allows you to grant or revoke tickets, to view ticket details, and to test the impact of ticket failover with the Simulator.

8.2.1 Monitoring Multiple Clusters with the Cluster Dashboard

You can use Hawk as a single point of administration for monitoring multiple clusters. Hawk's Cluster Dashboard allows you to view a summary of multiple clusters, with each summary listing the number of nodes, resources, tickets (if you use GEO clusters), and their state. The summary also shows if any failures have appeared in the respective cluster.

The cluster information displayed in the Cluster Dashboard is stored in a persistent cookie. This means you need to decide which Hawk instance you want to view the Cluster Dashboard on, and always use that one. The machine you are running Hawk on does not even have to be part of any cluster for that purpose—it can be a separate, unrelated system.

Procedure 12: Monitoring Multiple Clusters with Hawk
Prerequisites
  • All clusters to be monitored from Hawk's Cluster Dashboard must be running SUSE Linux Enterprise High Availability Extension 12. It is not possible to monitor clusters that are running earlier versions of SUSE Linux Enterprise High Availability Extension.

  • If you did not replace the self-signed certificate for Hawk on every cluster node with your own certificate (or a certificate signed by an official Certificate Authority), you must log in to Hawk on every node in every cluster at least once. Verify the certificate (and add an exception in the browser to bypass the warning).

  • If you are using Mozilla Firefox, you must change its preferences to Accept third-party cookies. Otherwise cookies from monitored clusters will not be set, thus preventing login to the clusters you are trying to monitor.

  1. Start the Hawk Web service on a machine you want to use for monitoring multiple clusters.

  2. Start a Web browser and as URL enter the IP address or hostname of the machine that runs Hawk:

    https://IPaddress:7630/
  3. On the Hawk login screen, click the Dashboard link in the right upper corner.

    The Add Cluster dialog appears.

  4. Enter a custom Cluster Name with which to identify the cluster the Cluster Dashboard.

  5. Enter the Host Name of one of the cluster nodes and confirm your changes.

    The Cluster Dashboard opens and shows a summary of the cluster you just added.

  6. To add more clusters to the dashboard, click the plus icon and enter the details for the next cluster.

    Hawk—Cluster Dashboard
    Figure 8: Hawk—Cluster Dashboard
  7. To remove a cluster from the dashboard, click the x icon next to the cluster's summary.

  8. To view more details about a cluster, click somewhere into the cluster's box on the dashboard.

    This opens a new browser window or new browser tab. If you are not currently logged in to the cluster, this takes you to the Hawk login screen. After having logged in, Hawk shows the Cluster Status of that cluster in the summary view. From here, you can administrate the cluster with Hawk as usual.

  9. As the Cluster Dashboard stays open in a separate browser window or tab, you can easily switch between the dashboard and the administration of individual clusters in Hawk.

Any status changes for nodes or resources are reflected almost immediately within the Cluster Dashboard.

8.2.2 Managing Tickets with Hawk

Note
Note: Granting Tickets to Current Site

Though you can view tickets for all sites with Hawk, any grant operations triggered by Hawk only apply to the current site, that means on the site of the cluster node that you are currently connected to with Hawk. To grant a ticket to another site of your GEO cluster, start Hawk on one of the cluster nodes belonging to the respective site.

Procedure 13: Granting, Revoking and Viewing with Hawk

Tickets are visible in Hawk if they have been granted or revoked at least once or if they are referenced in a ticket dependency—see Procedure 8, “Configuring Ticket Dependencies”. In case a ticket is referenced in a ticket dependency, but has not been granted to any site yet, Hawk displays it as revoked.

  1. Start a Web browser and log in to the cluster.

  2. In the left navigation bar, select Cluster Status.

  3. Switch to the Summary View or the Tree View to view tickets. Along with information about cluster nodes and resources, Hawk also displays a Tickets category.

    It shows the following information:

    • Granted: Tickets that are granted to the current site.

    • Elsewhere: Tickets that are granted to another site.

    • Revoked: Tickets that have been revoked.

    Hawk Cluster Status (Summary View)—Ticket Overview
    Figure 9: Hawk Cluster Status (Summary View)—Ticket Overview
  4. To view more details, either click the title of the Tickets category or the individual ticket entries that are marked as links. Hover the cursor over the information icon next to the ticket to display the following information: time when the ticket has been last granted, the leader, and the ticket expiry date.

    Hawk Cluster Status (Summary View)—Ticket Details
    Figure 10: Hawk Cluster Status (Summary View)—Ticket Details
  5. To revoke a ticket, click the wrench icon next to the ticket and select Revoke. Confirm your choice when Hawk prompts for a confirmation.

    If the ticket cannot be revoked for any reasons, Hawk shows an error message. After the ticket has been successfully revoked, Hawk will update the ticket status in the Tickets category.

  6. You can only grant tickets that are not already given to any site. To grant a ticket to the current site:

    1. Click the wrench icon next to a ticket with the current status Revoked and select Grant.

    2. Confirm your choice when Hawk prompts for a confirmation.

      If the ticket cannot be granted for any reasons, Hawk shows an error message. After the ticket has been successfully granted, Hawk will update the ticket status in the Tickets category.

Procedure 14: Simulating Granting and Revoking Tickets

Hawk's Simulator allows you to explore failure scenarios before they happen. To explore if your resources that depend on a certain ticket behave as expected, you can also test the impact of granting or revoking tickets.

  1. Start a Web browser and log in to Hawk.

  2. Click the wrench icon next to the username in the top-level row, and select Simulator.

    Hawk's background changes color to indicate the simulator is active. A simulator dialog opens in the bottom right hand corner of the screen. Its title Simulator (initial state) indicates that Cluster Status screen still reflects the current state of the cluster.

  3. To simulate status change of a ticket:

    1. Click +Ticket in the simulator control dialog.

    2. Select the Action you want to simulate.

    3. Confirm your changes to add them to the queue of events listed in the controller dialog below Injected State.

  4. To start the simulation, click Run in the simulator control dialog. The Cluster Status screen displays the impact of the simulated events. The simulator control dialog changes to Simulator (final state).

  5. To exit the simulation mode, close the simulator control dialog. The Cluster Status screen switches back to its normal color and displays the current cluster state.

HawkSimulator—Tickets
Figure 11: HawkSimulator—Tickets

For more information about Hawk's Simulator (and which other scenarios can be explored with it), refer to the Administration Guide for SUSE Linux Enterprise High Availability Extension, available from http://www.suse.com/documentation/. Refer to chapter Configuring and Managing Cluster Resources (Web Interface), section Exploring Potential Failure Scenarios.

9 Troubleshooting

Booth uses the same logging mechanism as the CRM. Thus, changing the log level will also take effect on booth logging. The booth log messages also contain information about any tickets.

Both the booth log messages and the booth configuration file are included in the hb_report and crm_report.

In case of unexpected booth behavior or any problems, check the logging data with sudo journalctl -n or create a detailed cluster report with either hb_report or crm_report.

In case you can access the cluster nodes on all sites (plus the arbitrators) from one single host via SSH, it is possible to collect log files from all of them within the same hb_report. When calling hb_report with the -n option, it gets the log files from all hosts that you specify with -n (instead of trying to obtain the list of nodes from the respective cluster). For example, to create a single hb_report including the log files from two two-node clusters (192.168.2.190|192.168.2.191 and 192.168.1.90|192.168.1.91) and an arbitrator (147.2.207.14), use the following command:

root #  hb_report -n "147.2.207.14 192.168.2.190 192.168.1.90 192.168.2.191
  192.168.1.91" -f 10:00 -t 11:00 db-incident

If the issue is about booth only and you know on which cluster nodes (within a site) booth is running, then specify only those two nodes plus the arbitrator.

If there is no way to access all sites from one host, you need to run hb_report individually on the arbitrator and on the cluster nodes of the individual sites, specifying the same period of time. To collect the logs on an arbitrator, you must use the -S option for single node operation:

site1# hb_report -f 10:00 -t 11:00 db-incident-site1
site2# hb_report -f 10:00 -t 11:00 db-incident-site2
arbitrator# hb_report -S -f 10:00 -t 11:00 db-incident-arb

However, it is preferable to produce one single hb_report for all machines that you need log files from.

10 Upgrading to the Latest Product Version

For general instructions on how to upgrade a cluster, see the Administration Guide for SUSE Linux Enterprise High Availability Extension 12. It is available at http://www.suse.com/documentation/. The chapter Upgrading Your Cluster and Updating Software Packages also describes which preparations to take care of before starting the upgrade process.

10.1 Upgrading from SLE HA 11 SP3 to SLE HA 12

The former booth version (v0.1) was based on the Paxos algorithm. The current booth version (v0.2) is loosely based on raft and incompatible with the one running v0.1. Therefore, rolling upgrades are not possible. Due to the new multi-tenancy feature, the new arbitrator init script cannot stop nor test the status of the Paxos v0.1 arbitrator. On upgrade to v0.2, the arbitrator will be stopped, if running. The OCF resource-agent ocf:pacemaker:booth-site is capable of stopping and monitoring the booth v0.1 site daemon.

  1. For an upgrade of the cluster nodes from SUSE Linux Enterprise High Availability Extension 11 SP3 to SUSE Linux Enterprise High Availability Extension 12, follow the instructions in the Administration Guide for SUSE Linux Enterprise High Availability Extension 12, section Upgrading from SLE HA 11 SP3 to SLE HA 12.

  2. If you use arbitrators outside of the cluster sites:

    1. Upgrade them from SUSE Linux Enterprise Server 11 SP3 to SUSE Linux Enterprise Server 12, too.

    2. Add the GEO Clustering for SUSE Linux Enterprise High Availability Extension add-on and install the packages as described in Section 1.2, “Installing the Packages on Arbitrators”.

  3. Because the syntax and the consensus algorithm for booth has changed, you need to update the booth configuration files to match the latest requirements. Whereas previously the optional expiry time and weights could be specified by appending them to the ticket name with a semicolon (;) as separator, the new syntax has separate tokens for all ticket options. See Section 6, “Setting Up the Booth Services” for details. If you did not specify expiry time or weights different from the defaults and do not want to make use of the multi-tenancy feature, you can still use the old /etc/booth/booth.conf.

  4. Synchronize the updated booth configuration files across all cluster sites and arbitrators.

  5. Start the booth service on the cluster sites and the arbitrators as described in Section 6.4, “Enabling and Starting the Booth Services”.

Print this page