Fencing is a very important concept in computer clusters for HA (High Availability). A cluster sometimes detects that one of the nodes is behaving strangely and needs to remove it. This is called fencing and is commonly done with a STONITH resource. Fencing may be defined as a method to bring an HA cluster to a known state.
Every resource in a cluster has a state attached. For example: “resource r1 is started on alice”. In an HA cluster, such a state implies that “resource r1 is stopped on all nodes except alice”, because an HA cluster must make sure that every resource may be started on only one node. Every node must report every change that happens to a resource. The cluster state is thus a collection of resource states and node states.
When the state of a node or resource cannot be established with certainty, fencing comes in. Even when the cluster is not aware of what is happening on a given node, fencing can ensure that the node does not run any important resources.
There are two classes of fencing: resource level and node level fencing. The latter is the primary subject of this chapter.
Using resource level fencing the cluster can ensure that a node cannot access one or more resources. One typical example is a SAN, where a fencing operation changes rules on a SAN switch to deny access from the node.
Resource level fencing can be achieved by using normal resources on which the resource you want to protect depends. Such a resource would simply refuse to start on this node and therefore resources which depend on it will not run on the same node.
Node level fencing ensures that a node does not run any resources at all. This is usually done in a simple if brutal way: reset or power off the node.
In SUSE® Linux Enterprise High Availability Extension, the fencing implementation is STONITH (Shoot The
Other Node in the Head). It provides node level fencing. The High Availability Extension
includes the stonith command line tool, an extensible
interface for remotely powering down a node in the cluster. For an
overview of the available options, run stonith --help
or refer to the man page of stonith for more
information.
To use node level fencing, you first need to have a fencing device. To
get a list of STONITH devices which are supported by the High Availability Extension, run
the following command as root on any of the nodes:
stonith -L
STONITH devices may be classified into the following categories:
Power Distribution Units are an essential element in managing power capacity and functionality for critical network, server and data center equipment. They can provide remote load monitoring of connected equipment and individual outlet power control for remote power recycling.
A stable power supply provides emergency power to connected equipment by supplying power from a separate source in the event of utility power failure.
If you are running a cluster on a set of blades, then the power control device in the blade enclosure is the only candidate for fencing. Of course, this device must be capable of managing single blade computers.
Lights-out devices (IBM RSA, HP iLO, Dell DRAC) are becoming increasingly popular and may even become standard in off-the-shelf computers. However, they are inferior to UPS devices, because they share a power supply with their host (a cluster node). If a node stays without power, the device supposed to control it would be just as useless. In that case, the CRM would continue its attempts to fence the node indefinitely while all other resource operations would wait for the fencing/STONITH operation to complete.
Testing devices are used exclusively for testing purposes. They are usually more gentle on the hardware. Once the cluster goes into production, they must be replaced with real fencing devices.
The choice of the STONITH device depends mainly on your budget and the kind of hardware you use.
The STONITH implementation of SUSE® Linux Enterprise High Availability Extension consists of two components:
stonithd is a daemon which can be accessed by local processes or over the network. It accepts the commands which correspond to fencing operations: reset, power-off, and power-on. It can also check the status of the fencing device.
The stonithd daemon runs on every node in the CRM HA cluster. The stonithd instance running on the DC node receives a fencing request from the CRM. It is up to this and other stonithd programs to carry out the desired fencing operation.
For every supported fencing device there is a STONITH plug-in which
is capable of controlling said device. A STONITH plug-in is the
interface to the fencing device. On each node, all STONITH plug-ins
reside in /usr/lib/stonith/plugins (or in
/usr/lib64/stonith/plugins for 64-bit
architectures). All STONITH plug-ins look the same to stonithd, but
are quite different on the other side reflecting the nature of the
fencing device.
Some plug-ins support more than one device. A typical example is
ipmilan (or external/ipmi)
which implements the IPMI protocol and can control any device which
supports this protocol.
To set up fencing, you need to configure one or more STONITH
resources—the stonithd daemon requires no configuration. All
configuration is stored in the CIB. A STONITH resource is a resource of
class stonith (see
Section 4.2.2, “Supported Resource Agent Classes”). STONITH
resources are a representation of STONITH plug-ins in the CIB. Apart
from the fencing operations, the STONITH resources can be started,
stopped and monitored, just like any other resource. Starting or stopping
STONITH resources means loading and unloading the STONITH device
driver on a node. Starting and stopping are thus only administrative
operations and do not translate to any operation on the fencing device
itself. However, monitoring does translate to logging it to the device
(to verify that the device will work in case it is needed). When a
STONITH resource fails over to another node it enables the current node
to talk to the STONITH device by loading the respective driver.
STONITH resources can be configured just like any other resource. For more information about configuring resources, see Section 5.3.3, “Creating STONITH Resources”, or Section 6.4.3, “Creating a STONITH Resource”.
The list of parameters (attributes) depends on the respective STONITH
type. To view a list of parameters for a specific device, use the
stonith command:
stonith -t stonith-device-type -n
For example, to view the parameters for the ibmhmc
device type, enter the following:
stonith -t ibmhmc -n
To get a short help text for the device, use the -h
option:
stonith -t stonith-device-type -h
In the following, find some example configurations written in the syntax
of the crm command line tool. To apply them, put the
sample in a text file (for example, sample.txt) and
run:
root #crm< sample.txt
For more information about configuring resources with the
crm command line tool, refer to
Chapter 6, Configuring and Managing Cluster Resources (Command Line).
Some of the examples below are for demonstration and testing purposes
only. Do not use any of the Testing Configuration
examples in real-life cluster scenarios.
configure primitive st-null stonith:null \ params hostlist="alice bob" clone fencing st-null commit
An alternative configuration:
configure primitive st-alice stonith:null \ params hostlist="alice" primitive st-bob stonith:null \ params hostlist="bob" location l-st-alice st-alice -inf: alice location l-st-bob st-bob -inf: bob commit
This configuration example is perfectly alright as far as the cluster software is concerned. The only difference to a real world configuration is that no fencing operation takes place.
A more realistic example (but still only for testing) is the following external/ssh configuration:
configure primitive st-ssh stonith:external/ssh \ params hostlist="alice bob" clone fencing st-ssh commit
This one can also reset nodes. The configuration is similar to the
first one which features the null STONITH device. In this example,
clones are used. They are a CRM/Pacemaker feature. A clone is basically
a shortcut: instead of defining n identical, yet
differently named resources, a single cloned resource suffices. By far
the most common use of clones is with STONITH resources, as long as
the STONITH device is accessible from all nodes.
The real device configuration is not much different, though some devices may require more attributes. An IBM RSA lights-out device might be configured like this:
configure primitive st-ibmrsa-1 stonith:external/ibmrsa-telnet \ params nodename=alice ipaddr=192.168.0.101 \ userid=USERID passwd=PASSW0RD primitive st-ibmrsa-2 stonith:external/ibmrsa-telnet \ params nodename=bob ipaddr=192.168.0.102 \ userid=USERID passwd=PASSW0RD location l-st-alice st-ibmrsa-1 -inf: alice location l-st-bob st-ibmrsa-2 -inf: bob commit
In this example, location constraints are used for the following reason: There is always a certain probability that the STONITH operation is going to fail. Therefore, a STONITH operation on the node which is the executioner as well is not reliable. If the node is reset, it cannot send the notification about the fencing operation outcome. The only way to do that is to assume that the operation is going to succeed and send the notification beforehand. But if the operation fails, problems could arise. Therefore, by convention, stonithd refuses to kill its host.
The configuration of a UPS type fencing device is similar to the examples above. The details are not covered here. All UPS devices employ the same mechanics for fencing. How the device is accessed varies. Old UPS devices only had a serial port, in most cases connected at 1200baud using a special serial cable. Many new ones still have a serial port, but often they also use a USB or Ethernet interface. The kind of connection you can use depends on what the plug-in supports.
For example, compare the apcmaster with the
apcsmart device by using the stonith -t
stonith-device-type -n command:
stonith -t apcmaster -h
returns the following information:
STONITH Device: apcmaster - APC MasterSwitch (via telnet) NOTE: The APC MasterSwitch accepts only one (telnet) connection/session a time. When one session is active, subsequent attempts to connect to the MasterSwitch will fail. For more information see http://www.apc.com/ List of valid parameter names for apcmaster STONITH device: ipaddr login password
With
stonith -t apcsmart -h
you get the following output:
STONITH Device: apcsmart - APC Smart UPS (via serial port - NOT USB!). Works with higher-end APC UPSes, like Back-UPS Pro, Smart-UPS, Matrix-UPS, etc. (Smart-UPS may have to be >= Smart-UPS 700?). See http://www.networkupstools.org/protocols/apcsmart.html for protocol compatibility details. For more information see http://www.apc.com/ List of valid parameter names for apcsmart STONITH device: ttydev hostlist
The first plug-in supports APC UPS with a network port and telnet protocol. The second plug-in uses the APC SMART protocol over the serial line, which is supported by many different APC UPS product lines.
As explained in Section 8.3.1, “Example STONITH Resource Configurations”, there are several ways to configure a STONITH resource: using constraints, clones, or both. The choice of which construct to use for configuration depends on several factors: nature of the fencing device, number of hosts managed by the device, number of cluster nodes, or personal preference.
If clones are safe to use with your configuration and they reduce the configuration, then use cloned STONITH resources.
Just like any other resource, the STONITH class agents also support the monitoring operation for checking status.
Monitor STONITH resources regularly, yet sparingly. For most devices a monitoring interval of at least 1800 seconds (30 minutes) should suffice.
Fencing devices are an indispensable part of an HA cluster, but the less you need to use them, the better. Power management equipment is often affected by too much broadcast traffic. Some devices cannot handle more than ten or so connections per minute. Some get confused if two clients try to connect at the same time. Most cannot handle more than one session at a time.
Checking the status of fencing devices once every few hours should be enough in most cases. The probability that a fencing operation needs to be performed and the power switch fails is low.
For detailed information on how to configure monitor operations, refer to Section 6.4.8, “Configuring Resource Monitoring” for the command line approach.
In addition to plug-ins which handle real STONITH devices, there are special purpose STONITH plug-ins.
Some of the STONITH plug-ins mentioned below are for demonstration and testing purposes only. Do not use any of the following devices in real-life scenarios because this may lead to data corruption and unpredictable results:
external/ssh
ssh
null
external/kdumpcheck
This plug-in checks if a Kernel dump is in progress on a node. If so,
it returns true, and acts as if the node has been
fenced. The node cannot run any resources during the dump anyway. This
avoids fencing a node that is already down but doing a dump, which
takes some time. The plug-in must be used in concert with another,
real STONITH device. For more details, see
/usr/share/doc/packages/cluster-glue/README_kdumpcheck.txt.
external/sbd
This is a self-fencing device. It reacts to a so-called “poison pill” which can be inserted into a shared disk. On shared-storage connection loss, it stops the node from operating. Learn how to use this STONITH agent to implement storage-based fencing in Chapter 17, Storage Protection. See also http://www.linux-ha.org/wiki/SBD_Fencing for more details.
external/sbd and DRBD
The external/sbd fencing mechanism requires that
the SBD partition is readable directly from each node. Thus, a DRBD*
device must not be used for an SBD partition.
However, you can use the fencing mechanism for a DRBD cluster, provided the SBD partition is located on a shared disk that is not mirrored or replicated.
external/ssh
Another software-based “fencing” mechanism. The nodes
must be able to log in to each other as root without passwords.
It takes a single parameter, hostlist, specifying
the nodes that it will target. As it is not able to reset a truly
failed node, it must not be used for real-life clusters—for
testing and demonstration purposes only. Using it for shared storage
would result in data corruption.
meatware
meatware requires help from the user to operate.
Whenever invoked, meatware logs a CRIT severity
message which shows up on the node's console. The operator then
confirms that the node is down and issues a
meatclient(8) command. This tells
meatware to inform the cluster that the node should
be considered dead. See
/usr/share/doc/packages/cluster-glue/README.meatware
for more information.
null
This is a fake device used in various testing scenarios. It always claims that it has shot a node, but never does anything. Do not use it unless you know what you are doing.
suicide
This is a software-only device, which can reboot a node it is running
on, using the reboot command. This requires action
by the node's operating system and can fail under certain
circumstances. Therefore avoid using this device whenever possible.
However, it is safe to use on one-node clusters.
suicide and null are the only
exceptions to the “I do not shoot my host” rule.
Check the following list of recommendations to avoid common mistakes:
Do not configure several power switches in parallel.
To test your STONITH devices and their configuration, pull the plug once from each node and verify that fencing the node does takes place.
Test your resources under load and verify the timeout values are appropriate. Setting timeout values too low can trigger (unnecessary) fencing operations. For details, refer to Section 4.2.9, “Timeout Values”.
Use appropriate fencing devices for your setup. For details, also refer to Section 8.5, “Special Fencing Devices”.
Configure one ore more STONITH resources. By default, the global
cluster option stonith-enabled is set to
true. If no STONITH resources have been defined,
the cluster will refuse to start any resources.
Do not set the global cluster option
stonith-enabled to false
for the following reasons:
Clusters without STONITH enabled are not supported.
DLM/OCFS2 will block forever waiting for a fencing operation that will never happen.
Do not set the global cluster option
startup-fencing to false.
By default, it is set to true for the following
reason: If a node is in an unknown state during cluster startup, the
node will be fenced once to clarify its status.
/usr/share/doc/packages/cluster-glue
In your installed system, this directory contains README files for many STONITH plug-ins and devices.
Information about STONITH on the home page of the The High Availability Linux Project.
Fencing and Stonith: Information about fencing on the home page of the Pacemaker Project.
Pacemaker Explained (Pacemaker 1.1 for Corosync 2.x and crmsh): Explains the concepts used to configure Pacemaker. Contains comprehensive and very detailed information for reference.
Article explaining the concepts of split brain, quorum and fencing in HA clusters.