To configure and manage cluster resources, either use HA Web Konsole (Hawk), a Web-based user interface, or the crm shell (crmsh) command line utility.
Hawk allows you to monitor and administer your Linux cluster from non-Linux machines as well. Furthermore, it is the ideal solution in case your system only provides a minimal graphical user interface.
This chapter introduces Hawk and covers basic tasks for configuring and
managing cluster resources: modifying global cluster options, creating
basic and advanced types of resources (groups and clones), configuring
constraints, specifying failover nodes and failback nodes, configuring
resource monitoring, starting, cleaning up or removing resources, and
migrating resources manually. For detailed analysis of the cluster
status, Hawk generates a cluster report (hb_report).
You can view the cluster history or explore potential failure scenarios
with the simulator.
Hawk's Web interface allows you to monitor and administer your Linux cluster from non-Linux machines as well. Furthermore, it is the ideal solution in case your system only provides a minimal graphical user interface.
The Web interface is included in the hawk
package. It must be installed on all cluster nodes you want to connect to with Hawk. On
the machine from which you want to access a cluster node using Hawk, you only need a
(graphical) Web browser with JavaScript and cookies enabled to establish the connection.
To use Hawk, the respective Web service must be started on the node that you want to connect to via the Web interface.
If you have set up your cluster with the scripts from the ha-cluster-bootstrap package, the Hawk service is already started. In that
case, skip Procedure 5.1, “Starting Hawk Services” and proceed with Procedure 5.2, “Logging In to the Hawk Web Interface”.
On the node you want to connect to, open a shell and log in as
root.
Check the status of the service by entering
root #systemctlstatus hawk.service
If the service is not running, start it with
root #systemctlstart hawk.service
If you want Hawk to start automatically at boot time, execute the following command:
root #systemctlenable hawk.service
Hawk users must be members of the haclient group. The installation creates a Linux user named hacluster, who is added to the haclient group. When using the
ha-cluster-init script for setup, a default password is set for the
hacluster user.
Before starting Hawk, set or change the password for the hacluster
user. Alternatively, create a new user which is a member of the haclient group.
Do this on every node you will connect to with Hawk.
The Hawk Web interface uses the HTTPS protocol and port 7630.
On any machine, start a Web browser and make sure that JavaScript and cookies are enabled.
As URL, enter the IP address or hostname of any cluster node running the
Hawk Web service. Alternatively, enter the address of any IPaddr(2)
resource that the cluster operator may have configured:
https://HOSTNAME_OR_IP_ADDRESS:7630/
If a certificate warning appears when you try to access the URL for the first time, a self-signed certificate is in use. Self-signed certificates are not considered trustworthy by default.
Ask your cluster operator for the certificate details to verify the certificate.
To proceed anyway, you can add an exception in the browser to bypass the warning.
For information on how to replace the self-signed certificate with a certificate signed by an official Certificate Authority, refer to Replacing the Self-Signed Certificate.
On the Hawk login screen, enter the and
of the
hacluster user (or of any
other user that is a member of the
haclient group).
Click .
The screen appears, displaying the status of your
cluster nodes and resources. The information shown is similar to the output of
crm status in the crm shell.
After logging in, Hawk displays the screen. It shows a summary with the most important global cluster parameters and the status of your cluster nodes and resources. The following color code is used for status display of nodes and resources:
Green: OK. For example, the resource is running or the node is online.
Red: Bad, unclean. For example, the resource has failed or the node was not shut down cleanly.
Yellow: In transition. For example, the node is currently being shut down or a resource
is currently being started or stopped. If you click a pending resource to view its details,
Hawk also displays the state to which the resource is currently changing
(Starting, Stopping, Moving,
Promoting, or Demoting).
The transitional state for resources is only shown if the operation property
record-pending is set to true. If you have set up your
cluster with the ha-cluster-init script, this property is turned on
globally by default. To enable it manually, either use the Hawk screen to add and enable the property below or use the following command:
root #crmconfigure op_defaults record-pending=true
Gray: Not running, but the cluster expects it to be running. For
example, nodes that the administrator has stopped or put into
standby mode. Also nodes that are offline are
displayed in gray (if they have been shut down cleanly).
In addition to the color code, Hawk also displays self-explanatory icons for the state of nodes, resources, tickets and for error messages in all views of the screen.
If a resource has failed, an error message with the details is shown in red at the top of the screen. To analyze the causes for the failure, click the error message. This automatically takes you to Hawk's and triggers the collection of data for a time span of 20 minutes (10 minutes before and 10 minutes after the failure occurred). For more details, refer to Procedure 5.27, “Viewing Transitions with the History Explorer”.
The screen refreshes itself in near real-time. Choose between the following views, which you can access with the three icons in the upper right corner:
Shows the most important global cluster parameters and the status of your cluster nodes and resources at the same time. If your setup includes GEO clusters (multi-site clusters), the summary view also shows tickets. To view details about all elements belonging to a certain category (tickets, nodes, or resources), click the category title, which is marked as a link. Otherwise click the individual elements for details.
Presents an expandable view of the most important global cluster parameters and the status of your cluster nodes and resources. If your setup includes GEO clusters (multi-site clusters), the tree view also shows tickets. Click the arrows to expand or collapse the elements belonging to the respective category. In contrast to the this view not only shows the IDs and status of resources but also the type (for example, primitive, clone, or group).
For groups, it is also possible to switch to a view where resources of the same type
(within a group) are presented together. Press the ab icon in the
resources category to toggle between the normal view where the resources are displayed
individually and the view where they are coalesced by type. For example, if you have 3
resources of the type ocf:pacemaker:Dummy in one group, and only one of
them is running, the view-by-type view shows an entry like
1/3 ocf:pacemaker:Dummy NODENAME
on a grey background, to indicate only 1 of the three has already started.
This view is especially useful for larger clusters, because it shows in a concise way which resources are currently running on which node. Inactive nodes or resources are also displayed.
The top-level row of the main screen shows the username with which you are logged in. It also allows you to of the Web interface, and to access the following from the wrench icon next to the username:
. For details, refer to Section 5.4.7, “Exploring Potential Failure Scenarios”.
. Select this entry for a graphical representation of the nodes and the resources configured in the CIB. The diagram also shows the ordering and colocation between resources and node assignments (scores).
(hb_report). For
details, refer to
Section 5.4.8, “Generating a Cluster Report”.
To perform basic operator tasks on nodes and resources (like starting or stopping resources, bringing nodes online, or viewing details), click the wrench icon next to the respective node or resource to access a context menu. For any clone, group or master/slave child resource on any of the status screens, select the menu item from the context menu. Clicking this will let you start, stop, etc. the top-level clone or group to which that primitive belongs.
For more complex tasks like configuring resources, constraints, or global cluster options, use the navigation bar on the left hand side. From there, you can access the following screens:
: See Section 5.1.2, “Main Screen: Cluster Status” for details.
: See Procedure 5.27, “Viewing Transitions with the History Explorer” for details.
: See Section 5.3.1, “Configuring Resources with the Setup Wizard” for details.
: See Section 5.2, “Configuring Global Cluster Options” for details.
: See Chapter 9, Access Control Lists for details.
: See Section 5.3, “Configuring Cluster Resources” for details.
: See Section 5.3, “Configuring Cluster Resources” for details.
By default, users logged in as root or
hacluster have full
read-write access to all cluster configuration tasks. However,
Access Control Lists can be used
to define fine-grained access permissions.
If ACLs are enabled in the CRM, the available functions in Hawk
depend on the user role and access permissions assigned to you. In
addition, the following functions in Hawk can only be executed by the
user hacluster:
Generating an hb_report.
Using the history explorer.
Viewing recent events for nodes or resources.
Global cluster options control how the cluster behaves when confronted
with certain situations. They are grouped into sets and can be viewed and
modified with cluster management tools like Hawk, and
crm shell. The predefined values can be kept in most
cases. However, to make key functions of your cluster work correctly, you
need to adjust the following parameters after basic cluster setup:
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select to view the global cluster options and their current values. Hawk displays the most important parameters with regards to , , and .
Depending on your cluster requirements, adjust the :
Set to the appropriate value.
If you need to disable fencing for any reasons, deselect .
A cluster without STONITH is not supported.
To remove a property from the CRM configuration, click the minus icon next to the property. If a property is deleted, the cluster will behave as if that property had the default value. For details of the default values, refer to Section 4.2.6, “Resource Options (Meta Attributes)”.
To add a new property for the CRM configuration, choose one from the drop-down list and click the plus icon.
If you need to change or , proceed as follows:
To change the value of defaults that are already displayed, just edit the value in the respective input field.
To add a new resource default or operation default, choose one from the empty drop-down list, click the plus icon and enter a value. If there are default values defined, Hawk proposes them automatically.
To remove a resource or operation default, click the minus icon next to the parameter. If no values are specified for and , the cluster uses the default values that are documented in Section 4.2.6, “Resource Options (Meta Attributes)” and Section 4.2.8, “Resource Operations”.
Confirm your changes.
As a cluster administrator, you need to create cluster resources for every resource or application you run on servers in your cluster. Cluster resources can include Web sites, mail servers, databases, file systems, virtual machines, and any other server-based applications or services you want to make available to users at all times.
For an overview of the resource types you can create, refer to Section 4.2.3, “Types of Resources”. Apart from the basic specification of a resource (ID, class, provider, and type), you can add or modify the following parameters during or after creation of a resource:
Instance attributes (parameters) determine which
instance of a service the resource controls. For more information,
refer to Section 4.2.7, “Instance Attributes (Parameters)”.
When creating a resource, Hawk automatically shows any required parameters. Edit them to get a valid resource configuration.
Meta attributes tell the CRM how to treat a specific
resource. For more information, refer to
Section 4.2.6, “Resource Options (Meta Attributes)”.
When creating a resource, Hawk automatically lists the important meta
attributes for that resource (for example, the
target-role attribute that defines the initial state
of a resource. By default, it is set to Stopped, so
the resource will not start immediately).
Operations are needed for resource monitoring. For
more information, refer to
Section 4.2.8, “Resource Operations”.
When creating a resource, Hawk displays the most important resource
operations (monitor, start, and
stop).
The High Availability Extension comes with a predefined set of templates for some
frequently used cluster scenarios, for example, setting up a highly available
NFS server. Find the predefined templates in the hawk-templates package. You can also define your own wizard
templates. For detailed information, refer to https://github.com/ClusterLabs/hawk/blob/master/doc/wizard.txt.
Hawk provides a wizard that guides you through all configuration steps of a selected template. Follow the instructions on the screen. If you need information about an option, click it to display a short help text in Hawk.
Which templates are available to individual Hawk users may differ. The user's permissions to access templates can be regulated via ACL. See Chapter 9, Access Control Lists for details.
In the following procedure, we will use the wizard to configure an NFS server as example, which can be used as an NFS(v4/v3) fail-over server. The wizard relies on the cluster having been set up using the bootstrap scripts, so that key-based SSH access between nodes has been configured. You will be prompted for the following information:
The root password for the machine that you are logged in to
via Hawk. It needs to be the same as on all cluster nodes that
Hawk needs to touch in order to modify their file systems.
The ID of the base file system resource.
Details for the NFSv4 file system root.
Details for an NFSv3 export. A directory exported by an NFS server, which clients can integrate it into their system.
A floating IP address.
The resulting Pacemaker configuration will contain the following resources:
Manages the in-kernel Linux NFS daemon that serves locally mounted file systems to clients via the NFS network protocol.
Manages the virtual NFS root export, needed for NFSv4 clients. This
resource does not hold any actual NFS-exported data, merely the empty
directory (/srv/nfs) that the other NFS export is
mounted into.
Manages the NFSv3 export .
A virtual, floating cluster IP address, allowing NFS clients to connect to the service no matter which physical node it is running on.
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select . The shows a list of available resource templates. If you click an entry, Hawk displays a short help text about the template.
Select the template for the resource you want to configure (in our
case: NFS Server) and click
.
To configure a highly available NFS server, proceed as follows:
Enter the root password for the current machine and click
. Without the root password, the
configuration wizard is not able to do the necessary configuration
changes.
In the next screen, specify the that is to be exported via NFS by entering its root ID. Click .
In the next screen, enter the details for the virtual NFSv4 file system root (needed for NFSv4 clients). Specify the following parameters and click .
Define a to be used for this cluster resource.
Enter a . Hawk proposes
0 by default, as the ID for the root file system must either be
0 or the string root.
Specify a , for example:
/srv/nfs.
Enter a for client access. For example
10.9.9.0/255.255.255.0. If you keep the value
*, which is proposed by Hawk, this would mean to allow all
clients from everywhere.
Specify the . For the NFSv4 file
system root, Hawk proposes: rw,crossmnt.
In the next screen, enter the details for the exported NFS mount point. Specify the following parameters and click .
Define a to be used for this cluster resource.
Enter a . Hawk proposes
1 by default, as the ID for NFS exports that do
not represent an NFSv4 virtual file system root
must be set to a unique positive integer, or a UUID string (32
hexadecimal digits with arbitrary punctuation).
Specify a , for example:
/srv/nfs/example.
Enter a for client access. For example
10.9.9.0/255.255.255.0. If you keep the value
*, which is proposed by Hawk, this would mean to allow all
clients from everywhere.
Specify the . For the NFSv3
export, Hawk proposes: rw,mountpoint.
In the next screen, configure a virtual IP added used to access the NFS mounts. Specify the following parameters.
Define a to be used for this cluster resource.
Enter an in dotted quad notation.
Optionally, enter a . If not specified, it will be determined automatically.
For LVS Direct Routing configuration, enable . Otherwise leave it disabled.
Click .
The wizard displays the configuration snippet that will be applied to the CIB.
To apply it, click .
You have successfully configured an NFS(v4/v3) fail-over server.
To create the most basic type of a resource, proceed as follows:
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select . The screen shows categories for all types of resources. It lists any resources that are already defined.
Select the category and click the plus icon.
Specify the resource:
Enter a unique .
From the list, select the resource agent class you want to use for the resource: , , , or . For more information, see Section 4.2.2, “Supported Resource Agent Classes”.
If you selected as class, specify the of your OCF resource agent. The OCF specification allows multiple vendors to supply the same resource agent.
From the list, select the resource agent you want to use (for example, or ). A short description for this resource agent is displayed.
The selection you get in the list depends on the (and for OCF resources also on the ) you have chosen.
Hawk automatically shows any required parameters for the resource plus an empty drop-down list that you can use to specify an additional parameter.
To define (instance attributes) for the resource:
Enter values for each required parameter. A short help text is displayed as soon as you click the input field next to a parameter.
To completely remove a parameter, click the minus icon next to the parameter.
To add another parameter, click the empty drop-down list, select a parameter and enter a value for it.
Hawk automatically shows the most important resource and proposes default values. If you do not modify any settings here, Hawk will add the proposed operations and their default values as soon as you confirm your changes.
For details on how to modify, add or remove operations, refer to Procedure 5.15, “Adding or Modifying Monitor Operations”.
Hawk automatically lists the most important meta attributes for the
resource, for example target-role.
To modify or add :
To set a (different) value for an attribute, select one from the drop-down list next to the attribute or edit the value in the input field.
To completely remove a meta attribute, click the minus icon next to it.
To add another meta attribute, click the empty drop-down list and select an attribute. The default value for the attribute is displayed. If needed, change it as described above.
Click to finish the configuration. A message at the top of the screen shows if the resource was successfully created or not.
A cluster without STONITH is not supported.
By default, the global cluster option stonith-enabled
is set to true: If no STONITH resources have been
defined, the cluster will refuse to start any resources. Configure one
or more STONITH resources to complete the STONITH setup. While they
are configured similar to other resources, the behavior of STONITH
resources is different in some respects. For details refer to
Section 8.3, “STONITH Configuration”.
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select . The screen shows categories for all types of resources and lists all defined resources.
Select the category and click the plus icon.
Specify the resource:
Enter a unique .
From the list, select the resource agent class .
From the list, select the STONITH plug-in for controlling your STONITH device. A short description for this plug-in is displayed.
Hawk automatically shows the required for the resource. Enter values for each parameter.
Hawk displays the most important resource and proposes default values. If you do not modify any settings here, Hawk will add the proposed operations and their default values as soon as you confirm.
Adopt the default settings if there is no reason to change them.
Confirm your changes to create the STONITH resource.
To complete your fencing configuration, add constraints, use clones or both. For more details, refer to Chapter 8, Fencing and STONITH.
If you want to create lots of resources with similar configurations, defining a resource template is the easiest way. Once defined, it can be referenced in primitives or in certain types of constraints. For detailed information about function and use of resource templates, refer to Section 4.4.3, “Resource Templates and Constraints”.
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select . The screen shows categories for all types of resources plus a category.
Select the category and click the plus icon.
Enter a .
Specify the resource template as you would specify a primitive. Follow Procedure 5.5: Adding Primitive Resources, starting with Step 4.b.
Click to finish the configuration. A message at the top of the screen shows if the resource template was successfully created.
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
To reference the newly created resource template in a primitive, follow these steps:
In the left navigation bar, select . The screen shows categories for all types of resources. It lists all defined resources.
Select the category and click the plus icon.
Enter a unique .
Activate and, from the drop-down list, select the template to reference.
If needed, specify further , , or as described in Procedure 5.5, “Adding Primitive Resources”.
To reference the newly created resource template in colocational or order constraints, proceed as described in Procedure 5.10, “Adding or Modifying Colocational or Order Constraints”.
After you have configured all resources, specify how the cluster should handle them correctly. Resource constraints let you specify on which cluster nodes resources can run, in which order resources will be loaded, and what other resources a specific resource depends on.
For an overview of available types of constraints, refer to Section 4.4.1, “Types of Constraints”. When defining constraints, you also need to specify scores. For more information on scores and their implications in the cluster, see Section 4.4.2, “Scores and Infinity”.
Learn how to create the different types of constraints in the following procedures.
For location constraints, specify a constraint ID, resource, score and node:
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select . The screen shows categories for all types of constraints. It lists all defined constraints.
To add a new constraint, click the plus icon in the respective category.
To modify an existing constraint, click the wrench icon next to the constraint and select .
Enter a unique . When modifying existing constraints, the ID is already defined.
Select the for which to define the constraint. The list shows the IDs of all resources that have been configured for the cluster.
Set the for the constraint. Positive values
indicate the resource can run on the you
specify in the next step. Negative values mean it should not run on
that node. Setting the score to INFINITY forces the
resource to run on the node. Setting it to
-INFINITY means the resources must not run on the
node.
Select the for the constraint.
Click to finish the configuration. A message at the top of the screen shows if the constraint was successfully created.
For both types of constraints specify a constraint ID and a score, then add resources to a dependency chain:
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select . The screen shows categories for all types of constraints and lists all defined constraints.
To add a new or constraint, click the plus icon in the respective category.
To modify an existing constraint, click the wrench icon next to the constraint and select .
Enter a unique . When modifying existing constraints, the ID is already defined.
Define a .
For colocation constraints, the score determines the location
relationship between the resources. Positive values indicate the
resources should run on the same node. Negative values indicate the
resources should not run on the same node. Setting the score to
INFINITY forces the resources to run on the same
node. Setting it to -INFINITY means the resources
must not run on the same node. The score will be combined with other
factors to decide where to put the resource.
For order constraints, the constraint is mandatory if the score is
greater than zero, otherwise it is only a suggestion. The default
value is INFINITY.
For order constraints, you can usually keep the option enabled. This specifies that resources are stopped in reverse order.
To define the resources for the constraint, follow these steps:
Select a resource from the list . The list shows the IDs of all resources and all resource templates configured for the cluster.
To add the selected resource, click the plus icon next to the list. A new list appears beneath. Select the next resource from the list. As both colocation and order constraints define a dependency between resources, you need at least two resources.
Select one of the remaining resources from the list . Click the plus icon to add the resource.
Now you have two resources in a dependency chain.
If you have defined an order constraint, the topmost resource will start first, then the second etc. Usually the resources will be stopped in reverse order.
However, if you have defined a colocation constraint, the arrow icons between the resources reflect their dependency, but not their start order. As the topmost resource depends on the next resource and so on, the cluster will first decide where to put the last resource, then place the depending ones based on that decision. If the constraint cannot be satisfied, the cluster may decide not to allow the dependent resource to run at all.
Add as many resources as needed for your colocation or order constraint.
If you want to swap the order of two resources, click the double arrow at the right hand side of the resources to swap the resources in the dependency chain.
If needed, specify further parameters for each resource, like the role
(Master, Slave,
Started, or Stopped).
Click to finish the configuration. A message at the top of the screen shows if the constraint was successfully created.
As an alternative format for defining constraints, you can use resource sets. They have the same ordering semantics as groups.
As of SUSE Linux Enterprise High Availability Extension 12, it is now also possible to use resource sets within location constraints (whereas they could formerly only be used within colocation and ordering constraints).
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
To use a resource set within a location constraint:
Proceed as outlined in Procedure 5.9, “Adding or Modifying Location Constraints” from Step 1 to to Step 5.
Instead of only selecting a single resource, you can select multiple resources by pressing Ctrl or Shift in addition to the mouse click. This creates a resource set within the location constraint.
Continue by entering a and by selecting a for the constraint.
To remove a resource from the location constraint, press Ctrl and click the resource again to deselect it. If you are editing an existing location constraint, click to confirm your choice.
To use a resource set within a colocation or order constraint:
Proceed as described in Procedure 5.10, “Adding or Modifying Colocational or Order Constraints”.
When you have added the resources to the dependency chain, you can put them into a resource set by clicking the chain icon at the right hand side. A resource set is visualized by a frame around the resources belonging to a set.
You can also add multiple resources to a resource set or create multiple resource sets.
To extract a resource from a resource set, click the scissors icon above the respective resource.
The resource will be removed from the set and put back into the dependency chain at its original place.
Confirm your changes to finish the constraint configuration.
For more information on configuring constraints and detailed background information about the basic concepts of ordering and colocation, refer to the documentation available at http://www.clusterlabs.org/doc/:
Pacemaker Explained (Pacemaker 1.1 for Corosync 2.x and crmsh), chapter Resource Constraints
Colocation Explained
Ordering Explained
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select . The screen shows categories for all types of constraints and lists all defined constraints.
Click the wrench icon next to a constraint and select .
A resource will be automatically restarted if it fails. If that cannot
be achieved on the current node, or it fails N times on the current
node, it will try to fail over to another node. You can define a number
of failures for resources (a migration-threshold),
after which they will migrate to a new node. If you have more than two
nodes in your cluster, the node to which a particular resource fails
over is chosen by the High Availability software.
You can specify a specific node to which a resource will fail over by proceeding as follows:
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
Configure a location constraint for the resource as described in Procedure 5.9, “Adding or Modifying Location Constraints”.
Add the migration-threshold meta attribute to the
resource as described in
Procedure 5.5: Adding Primitive Resources,
Step 7
and enter a for the migration-threshold. The
value should be positive and less than INFINITY.
If you want to automatically expire the failcount for a resource, add
the failure-timeout meta attribute to the resource
as described in
Procedure 5.5: Adding Primitive Resources,
Step 7
and enter a for the failure-timeout.
If you want to specify additional failover nodes with preferences for a resource, create additional location constraints.
The process flow regarding migration thresholds and failcounts is demonstrated in Example 4.6, “Migration Threshold—Process Flow”.
Instead of letting the failcount for a resource expire automatically, you can also clean up failcounts for a resource manually at any time. Refer to Section 5.4.2, “Cleaning Up Resources” for details.
A resource may fail back to its original node when that node is back online and in the cluster. To prevent this or to specify a different node for the resource to fail back to, change the stickiness value of the resource. You can either specify the resource stickiness when creating it or afterwards.
For the implications of different resource stickiness values, refer to Section 4.4.5, “Failback Nodes”.
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
Add the resource-stickiness meta attribute to the
resource as described in
Procedure 5.5: Adding Primitive Resources,
Step 7.
Specify a value between -INFINITY and
INFINITY for the resource-stickiness.
Not all resources are equal. Some, such as Xen guests, require that the node hosting them meets their capacity requirements. If resources are placed so that their combined needs exceed the provided capacity, the performance of the resources diminishes or they fail.
To take this into account, the High Availability Extension allows you to specify the following parameters:
The capacity a certain node provides.
The capacity a certain resource requires.
An overall strategy for placement of resources.
Utilization attributes are used to configure both the resource's requirements and the capacity a node provides. The High Availability Extension now also provides means to detect and configure both node capacity and resource requirements automatically. For more details and a configuration example, refer to Section 4.4.6, “Placing Resources Based on Their Load Impact”.
To display a node's capacity values (defined via utilization attributes) as well as the capacity currently consumed by resources running on the node, switch to the screen in Hawk and select the node you are interested in. Click the wrench icon next to the node and select .
After you have configured the capacities your nodes provide and the capacities your resources require, you need to set the placement strategy in the global cluster options, otherwise the capacity configurations have no effect. Several strategies are available to schedule the load: for example, you can concentrate it on as few nodes as possible, or balance it evenly over all available nodes. For more information, refer to Section 4.4.6, “Placing Resources Based on Their Load Impact”.
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select to view the global cluster options and their current values.
From the drop-down list, choose
placement-strategy.
Depending on your requirements, set to the appropriate value.
Click the plus icon to add the new cluster property including its value.
Confirm your changes.
The High Availability Extension can not only detect a node failure, but also when an
individual resource on a node has failed. If you want to ensure that a
resource is running, configure resource monitoring for it. For resource
monitoring, specify a timeout and/or start delay value, and an interval.
The interval tells the CRM how often it should check the resource
status. You can also set particular parameters, such as
Timeout for start or
stop operations.
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select . The screen shows categories for all types of resources and lists all defined resources.
Select the resource to modify, click the wrench icon next to it and
select . The resource definition is
displayed. Hawk automatically shows the most important resource
operations (monitor, start,
stop) and proposes default values.
To change the values for an operation:
Click the pen icon next to the operation.
In the dialog that opens, specify the following values:
Enter a timeout value in seconds. After the
specified timeout period, the operation will be treated as
failed. The PE will decide what to do or
execute what you specified in the field
of the monitor operation.
For monitoring operations, define the monitoring
interval in seconds.
If needed, use the empty drop-down list at the bottom of the dialog to add more parameters, like (what to do if this action fails?) or (what conditions need to be fulfilled before this action occurs?).
Confirm your changes to close the dialog and to return to the screen.
To completely remove an operation, click the minus icon next to it.
To add another operation, click the empty drop-down list and select an operation. A default value for the operation is displayed. If needed, change it by clicking the pen icon.
Click to finish the configuration. A message at the top of the screen shows if the resource was successfully updated or not.
For the processes which take place if the resource monitor detects a failure, refer to Section 4.3, “Resource Monitoring”.
To view resource failures, switch to the screen in Hawk and select the resource you are interested in. Click the wrench icon next to the resource and select .
Some cluster resources depend on other components or resources and require that each component or resource starts in a specific order and runs on the same server. To simplify this configuration we support the concept of groups.
For an example of a resource group and more information about groups and their properties, refer to Section 4.2.5.1, “Groups”.
Groups must contain at least one resource, otherwise the configuration is not valid. In Hawk, primitives cannot be created or modified while creating a group. Before adding a group, create primitives and configure them as desired. For details, refer to Procedure 5.5, “Adding Primitive Resources”.
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select . The screen shows categories for all types of resources and lists all defined resources.
Select the category and click the plus icon.
Enter a unique .
To define the group members, select one or multiple entries in the list of and click the < icon to add them to the list. Any new group members are added to the bottom of the list. To define the order of the group members, you currently need to add and remove them in the order you desire.
If needed, modify or add as described in Adding Primitive Resources, Step 7.
Click to finish the configuration. A message at the top of the screen shows if the group was successfully created.
If you want certain resources to run simultaneously on multiple nodes in your cluster, configure these resources as a clones. For example, cloning makes sense for resources like STONITH and cluster file systems like OCFS2. You can clone any resource provided. Cloning is supported by the resource’s Resource Agent. Clone resources may be configured differently depending on which nodes they are running on.
For an overview of the available types of resource clones, refer to Section 4.2.5.2, “Clones”.
Clones can either contain a primitive or a group as sub-resources. In Hawk, sub-resources cannot be created or modified while creating a clone. Before adding a clone, create sub-resources and configure them as desired. For details, refer to Procedure 5.5, “Adding Primitive Resources” or Procedure 5.16, “Adding a Resource Group”.
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select . The screen shows categories for all types of resources and lists all defined resources.
Select the category and click the plus icon.
Enter a unique .
From the list, select the primitive or group to use as a sub-resource for the clone.
If needed, modify or add as described in Procedure 5.5: Adding Primitive Resources, Step 7.
Click to finish the configuration. A message at the top of the screen shows if the clone was successfully created.
In addition to configuring your cluster resources, Hawk allows you to manage existing resources from the screen. For a general overview of the screen, its different views and the color code used for status information, refer to Section 5.1.2, “Main Screen: Cluster Status”.
Basic resource operations can be executed from any cluster status view. Both and let you access the individual resources directly. However, in the you need to click the links in the resources category first to display the resource details. The detailed view also shows any attributes set for that resource. For primitive resources (regular primitives, children of groups, clones, or master/slave resources), the following information will be shown additionally:
the resource's failcount
the last failure timestamp (if the failcount is > 0)
operation history and timings (call id, operation, last run timestamp, execution time, queue time, return code and last change timestamp)
Before you start a cluster resource, make sure it is set up correctly. For example, if you want to use an Apache server as a cluster resource, set up the Apache server first and complete the Apache configuration before starting the respective resource in your cluster.
When managing a resource via the High Availability Extension, the same resource must not be started or stopped otherwise (outside of the cluster, for example manually or on boot or reboot). The High Availability Extension software is responsible for all service start or stop actions.
However, if you want to check if the service is configured properly, start it manually, but make sure that it is stopped again before High Availability takes over.
For interventions in resources that are currently managed by the cluster, set the
resource to maintenance mode first as
described in Procedure 5.23, “Applying Maintenance Mode to Resources”.
When creating a resource with Hawk, you can set its initial state with
the target-role meta attribute. If you set its value
to stopped, the resource does not start automatically
after being created.
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select .
In one of the individual resource views, click the wrench icon next to the resource and select . To continue, confirm the message that appears. As soon as the resource has started, Hawk changes the resource's color to green and shows on which node it is running.
A resource will be automatically restarted if it fails, but each failure increases the resource's failcount.
If a migration-threshold has been set for the
resource, the node will no longer run the resource when the number of
failures reaches the migration threshold.
A resource's failcount can either be reset automatically (by setting a
failure-timeout option for the resource) or you can
reset it manually as described below.
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select .
In one of the individual resource views, click the wrench icon next to the failed resource and select . To continue, confirm the message that appears.
This executes the commands
crm_resource and
-C crm_failcount for the
specified resource on the specified node.
-D
For more information, see the man pages of
crm_resource and crm_failcount.
If you need to remove a resource from the cluster, follow the procedure below to avoid configuration errors:
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select .
Clean up the resource on all nodes as described in Procedure 5.19, “Cleaning Up A Resource”.
In one of the individual resource views, click the wrench icon next to the resource and select . To continue, confirm the message that appears.
If the resource is stopped, click the wrench icon next to it and select .
As mentioned in Section 5.3.6, “Specifying Resource Failover Nodes”, the cluster will fail over (migrate) resources automatically in case of software or hardware failures—according to certain parameters you can define (for example, migration threshold or resource stickiness). Apart from that, you can also manually migrate a resource to another node in the cluster (or decide to just move it away from the current node and leave the decision where to put it to the cluster).
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select .
In one of the individual resource views, click the wrench icon next to the resource and select .
In the new window, select the node to which to move the resource.
This creates a location constraint with an INFINITY
score for the destination node.
Alternatively, select to move the resource .
This creates a location constraint with a -INFINITY
score for the current node.
Click to confirm the migration.
To allow a resource to move back again, proceed as follows:
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select .
In one of the individual resource views, click the wrench icon next to the resource and select . To continue, confirm the message that appears.
This uses the crm_resource command. The resource can move back to
its original location or it may stay where it is (depending on
resource stickiness).
-U
For more information, see the crm_resource man page
or Pacemaker Explained
(Pacemaker 1.1 for Corosync 2.x and crmsh), available from http://www.clusterlabs.org/doc/.
Refer to section Resource Migration.
Every now and then, you will need to perform testing or maintenance tasks on individual cluster components or the whole cluster—be it changing the cluster configuration, updating software packages for individual nodes, or upgrading the cluster to a higher product version.
With regards to that, High Availability Extension provides maintenance options on
several levels:
If you need to execute any testing or maintenance tasks while services are running under cluster control, make sure to follow this outline:
Before you start, set the individual resource, node or the whole cluster to maintenance mode. This helps to avoid unwanted side effects like resources not starting in an orderly fashion, the risk of unsynchronized CIBs across the cluster nodes or data loss.
Execute your maintenance task or tests.
After you have finished, remove the maintenance mode to start normal cluster operation.
For more details on what happens to the resources and the cluster while being in maintenance mode, see Section 4.7, “Maintenance Mode”.
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select . Select the resource you want to put in maintenance mode or unmanaged mode, click the wrench icon next to the resource and select .
Open the category.
From the empty drop-down list, select the attribute and click the plus icon to add it.
Activate the checkbox next to maintenance to set the maintenance
attribute to yes.
Confirm your changes.
After you have finished the maintenance task for that resource, deactivate the
checkbox next to the maintenance attribute for that resource.
From this point on, the resource will be managed by the High Availability Extension software again.
Sometimes it is necessary to put single nodes into maintenance mode. If your cluster consists of more than 3 nodes, you can easily set one node to maintenance mode, while the other nodes continue their normal operation.
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select .
In one of the individual nodes' views, click the wrench icon next to the node and select .
This will add the following instance attribute to the node:
maintenance="true". The resources previously
running on the maintenance-mode node will become
unmanaged. No new resources will be allocated to
the node until it leaves the maintenance mode.
To deactivate the maintenance mode, click the wrench icon next to the node and select .
For setting or unsetting the maintenance mode for the whole cluster, proceed as follows:
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select .
In the , select the attribute from the empty drop-down box and click the plus icon to add it.
To set maintenance-mode=true, active the checkbox next to
maintenance-mode and confirm your changes.
After you have finished the maintenance task for the whole cluster, deactivate the
checkbox next to the maintenance-mode attribute.
From this point on, High Availability Extension will take over cluster management again.
Hawk provides the following possibilities to view past events on the cluster (on different levels and in varying detail).
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select .
In the or , click the wrench icon next to the resource or node you are interested in and select .
The dialog that opens shows the events of the last hour.
The provides transition information
for a time frame that you can define. It also lists its previous runs
and allows you to reports that you no longer
need. The history explorer uses the information provided by
hb_report. You can also upload
an hb_report archive that has been created offline
(on a different cluster) and view the respective transitions with the
. See Procedure 5.28, “Using the Offline”.
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select .
By default, the period to explore is set to the last 24 hours. To modify this, set another and .
Click to start collecting transition data.
The following information is displayed:
The time line of all past transitions in the cluster.
In some cases, an information icon is displayed between the and columns. Hover the mouse pointer over the icon to display one of the following messages:
Input created by different Pacemaker version. In
that case, the transition graphs are only approximate since they have
been generated by a different PE version.
Pacemaker version not present in PE Input. This
happens during cluster start, before a DC is elected.
The pe-input* file for each transition and the
node on which it was generated. For each transition, the cluster saves a
copy of the state which is provided to the policy engine as input. The
path to this archive is logged. The pe-input* files
are only generated on the Designated Coordinator (DC), but as the DC can
change, there may be pe-input* files from several
nodes. The files show what the Policy Engine (PE)
planned to do.
Opens a pop-up window with snippets of logging data that belong to
that particular transition. Different amounts of details are available:
Clicking displays the output of
crm
history transition peinput
(including the resource agents' log messages), whereas also includes details from the
pengine, crmd, and
lrmd and is equivalent to
crm
history transition log peinput.
A graph and an XML representation of each transition. If you choose
to show the , the PE is reinvoked (using the
pe-input* files), and generates a graphical
visualization of the transition. Alternatively, you can view the XML
representation of the graph.
If two or more pe-inputs are listed, a link will appear to the right of each pair of pe-inputs. Clicking it displays the difference of configuration and status.
If you have Hawk running on any machine, you can also use the
“offline”, that means to view and analyze the transitions of
clusters that you are currently not connected to. All you need is a TAR
archive with an hb_report generated on a
SUSE Linux Enterprise High Availability Extension cluster. To upload and analyze it with the
proceed as follows:
Start a Web browser and log in to the Hawk Web interface as described in Section 5.1.1, “Starting Hawk and Logging In”.
In the left navigation bar, select .
It shows an entry .
Click and select the
hb_report archive to upload from your file
system.
Click to start the analysis of the archive and to display the information listed in History Explorer Results.
Hawk provides a that allows you to explore failure scenarios before they happen. After switching to the simulator mode, you can change the status of nodes, add or edit resources and constraints, change the cluster configuration, or execute multiple resource operations to see how the cluster would behave should these events occur. As long as the simulator mode is activated, a control dialog will be displayed in the bottom right hand corner of the screen. The simulator will collect the changes from all screens and will add them to its internal queue of events. The simulation run with the queued events will not be executed unless it is manually triggered in the control dialog. After the simulation run, you can view and analyze the details of what would have happened (log snippets, transition graph, and CIB states).
Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
Activate the simulator mode by clicking the wrench icon in the top-level row (next to the username), and by selecting .
Hawk's background changes color to indicate the simulator is active. A simulator control dialog is displayed in the bottom right hand corner of the screen. Its title indicates that no simulator run has occurred yet.
Fill the simulator's event queue:
To simulate status change of a node: Click in the simulator control dialog. Select the you want to manipulate and select its target . Confirm your changes to add them to the queue of events listed in the controller dialog.
To simulate a resource operation: Click in the simulator control dialog. Select the to manipulate and the to simulate. If necessary, define an . Select the on which to run the operation and the targeted . Confirm your changes to add them to the queue of events listed in the controller dialog.
Repeat the previous steps for any other node status changes or resource operations you wish to simulate.
To inject other changes that you wish to simulate:
Switch to one or more of the following Hawk screens: , , , , or .
Clicking the tab will deactivate simulator mode.
Add or modify parameters on the screens as desired.
The simulator will collect the changes from all screens and will add them to its internal queue of events.
To return to the simulator control dialog, switch to the screen or click the wrench icon in the top-level row and click again.
If you want to remove an event listed in , select the respective entry and click the minus icon beneath the list.
Start the simulation run by clicking in the simulator control dialog. The screen displays the simulated events. For example, if you marked a node as unclean, it will now be shown offline, and all its resources will be stopped. The simulator control dialog changes to .
To view more detailed information about the simulation run:
Click the link in the simulator dialog to see log snippets of what occurred.
Click the link to show the transition graph.
Click to display the initial CIB state. To see what the CIB would look like after the transition, click .
To start from scratch with a new simulation, use the button.
To exit the simulation mode, close the simulator control dialog. The screen switches back to its normal color and displays the current cluster state.
For analysis and diagnosis of problems occurring on the cluster, Hawk can generate a cluster report that collects information from all nodes in the cluster.
hb_report #Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.
Click the wrench icon next to the username in the top-level row, and select .
By default, the period to examine is the last hour. To modify this, set another and .
Click .
After the report has been created, download the
*.tar.bz2 file by clicking the respective link.
For more information about the log files that tools like
hb_report and crm_report cover, refer to
How can I create a report with an analysis of all my cluster nodes?.
You can use Hawk as a single point of administration for monitoring multiple clusters. Hawk's allows you to view a summary of multiple clusters, with each summary listing the number of nodes, resources, tickets (if you use GEO clusters), and their state. The summary also shows if any failures have appeared in the respective cluster.
The cluster information displayed in the is stored in a persistent cookie. This means you need to decide which Hawk instance you want to view the on, and always use that one. The machine you are running Hawk on does not even have to be part of any cluster for that purpose—it can be a separate, unrelated system.
All clusters to be monitored from Hawk's must be running SUSE Linux Enterprise High Availability Extension 12. It is not possible to monitor clusters that are running earlier versions of SUSE Linux Enterprise High Availability Extension.
If you did not replace the self-signed certificate for Hawk on every cluster node with your own certificate (or a certificate signed by an official Certificate Authority), you must log in to Hawk on every node in every cluster at least once. Verify the certificate (and add an exception in the browser to bypass the warning).
If you are using Mozilla Firefox, you must change its preferences to . Otherwise cookies from monitored clusters will not be set, thus preventing login to the clusters you are trying to monitor.
Start the Hawk Web service on a machine you want to use for monitoring multiple clusters.
Start a Web browser and as URL enter the IP address or hostname of the machine that runs Hawk:
https://IPaddress:7630/
On the Hawk login screen, click the link in the right upper corner.
The dialog appears.
Enter a custom with which to identify the cluster the .
Enter the of one of the cluster nodes and confirm your changes.
The opens and shows a summary of the cluster you just added.
To add more clusters to the dashboard, click the plus icon and enter the details for the next cluster.
To remove a cluster from the dashboard, click the x
icon next to the cluster's summary.
To view more details about a cluster, click somewhere into the cluster's box on the dashboard.
This opens a new browser window or new browser tab. If you are not currently logged in to the cluster, this takes you to the Hawk login screen. After having logged in, Hawk shows the of that cluster in the summary view. From here, you can administrate the cluster with Hawk as usual.
As the stays open in a separate browser window or tab, you can easily switch between the dashboard and the administration of individual clusters in Hawk.
Any status changes for nodes or resources are reflected almost immediately within the .
For more details on Hawk features that relate to geographically dispersed clusters (GEO clusters), see the Quick Start GEO Clustering for SUSE Linux Enterprise High Availability Extension.
Find the Hawk log files in /srv/www/hawk/log.
Check these files in case you cannot access Hawk.
If you have trouble starting or stopping a resource with Hawk, check the Pacemaker
log messages. Where Pacemaker logs to is specified in the logging section of
/etc/corosync/corosync.conf.
If you cannot log in to Hawk with a new user that is a member of the
haclient group (or if you
experience delays until Hawk accepts logins from this user), stop
the nscd daemon with
systemctl stop nscd.service
and try again.
To avoid the warning about the self-signed certificate on first Hawk startup, replace the automatically created certificate with your own certificate or a certificate that was signed by an official Certificate Authority (CA).
The certificate is stored in
/etc/lighttpd/certs/hawk-combined.pem and
contains both key and certificate.
Change the permissions to make the file only accessible by root:
chown root.root /etc/lighttpd/certs/hawk-combined.pem
chmod 600 /etc/lighttpd/certs/hawk-combined.pemAfter you have created or received your new key and certificate, combine them by executing the following command:
cat keyfile certificatefile > /etc/lighttpd/certs/hawk-combined.pem
Depending on the period of time you defined in the or and the events that
took place in the cluster during this time, Hawk might collect an
extensive amount of information stored in log files in the
/tmp directory. This might consume the remaining
free disk space on your node. In case Hawk should not respond after
using the or
, check the hard disk of your cluster node
and remove the respective log files.
If adding clusters to Hawk's dashboard fails, check the prerequisites listed in Procedure 5.31, “Monitoring Multiple Clusters with Hawk”.
The only polls one node in each cluster for status. If the node being polled goes down, the dashboard will cycle to poll another node. In that case, Hawk briefly displays a warning message about that node being inaccessible. The message will disappear after Hawk has found another node to contact to.