When managing shared storage on a cluster, every node must be informed about changes that are done to the storage subsystem. The Linux Volume Manager 2 (LVM2), which is widely used to manage local storage, has been extended to support transparent management of volume groups across the whole cluster. Clustered volume groups can be managed using the same commands as local storage.
Clustered LVM is coordinated with different tools:
Coordinates disk access for cLVM.
Enables flexible distribution of one file system over several disks. LVM provides a virtual pool of disk space.
Coordinates access to the LVM2 metadata so every node knows about changes. cLVM does not coordinate access to the shared data itself; to enable cLVM to do so, you must configure OCFS2 or other cluster-aware applications on top of the cLVM-managed storage.
Depending on your scenario it is possible to create a RAID 1 device with cLVM with the following layers:
LVM. This is a very flexible solution if you want to increase or decrease your file system size, add more physical storage, or create snapshots of your file systems. This method is described in Section 16.2.3, “Scenario: cLVM With iSCSI on SANs”.
DRBD. This solution only provides RAID 0 (striping) and RAID 1 (mirroring). The last method is described in Section 16.2.4, “Scenario: cLVM With DRBD”.
Although MD Devices (Linux Software RAID or mdadm) provides all
RAID levels, it does not support clusters yet. Therefor it is not covered in the above
list.
Make sure you have fulfilled the following prerequisites:
A shared storage device is available, such as provided by a Fibre Channel, FCoE, SCSI, iSCSI SAN, or DRBD*.
In case of DRBD, both nodes must be primary (as described in the following procedure).
Check if the locking type of LVM2 is cluster-aware. The keyword
locking_type in
/etc/lvm/lvm.conf must contain the value 3 (should
be the default). Copy the configuration to all nodes, if necessary.
First create your cluster resources as described in Section 16.2.2, “Creating the Cluster Resources” and then your LVM volumes. Otherwise it is impossible to remove the volumes later.
To track mirror log information in a cluster, the
cmirrord daemon is
used. Cluster mirrors are not possible without this daemon running.
We assume that /dev/sda and
/dev/sdb are the shared storage devices. Replace
these with your own device name(s), if necessary. Proceed as follows:
Create a cluster with at least two nodes.
Configure your cluster to run dlm,
clvmd, and STONITH:
root #crmconfigurecrm(live)configure#primitiveclvmd ocf:lvm2:clvmd \ op stop interval="0" timeout="100" \ op start interval="0" timeout="90" \ op monitor interval="20" timeout="20"crm(live)configure#primitivedlm ocf:pacemaker:controld \ op start interval="0" timeout="90" \ op stop interval="0" timeout="100" \ op monitor interval="60" timeout="60"crm(live)configure#primitivesbd_stonith stonith:external/sbdcrm(live)configure#groupbase-group dlm clvmdcrm(live)configure#clonebase-clone base-group \ meta interleave="true"
Leave crmsh with exit and commit your
changes.
Create a clustered volume group (VG):
root #pvcreate/dev/sda /dev/sdbroot #vgcreate-cy vg /dev/sda /dev/sdb
Create a mirrored-log logical volume (LV) in your cluster:
root #lvcreate-nlv -m1 -l10%VG vg --mirrorlog mirrored
Use lvs to show the progress. If the percentage
number has reached 100%, the mirrored disk is successfully synced.
To test the clustered volume /dev/vg/lv, use the
following steps:
Read or write to /dev/vg/lv.
Deactivate your LV with lvchange
-an.
Activate your LV with lvchange
-ay.
Use lvconvert to convert a mirrored log to a disk
log.
Create a mirrored-log LV in another cluster VG. This is a different volume group from the previous one.
The current cLVM can only handle one physical volume (PV) per mirror
side. If one mirror is actually made up of several PVs that need to be
concatenated or striped, lvcreate does not understand
this. For this reason, lvcreate and
cmirrord metadata needs to understand
“grouping” of PVs into one side, effectively supporting
RAID10.
In order to support RAID10 for
cmirrord, use the
following procedure (assuming that /dev/sda and
/dev/sdb are the shared storage devices):
Create a volume group (VG):
root #pvcreate/dev/sda /dev/sdbroot #vgcreatevg /dev/sda /dev/sdb
Open the file /etc/lvm/lvm.conf and go to the
section allocation. Set the following line and save
the file:
mirror_legs_require_separate_pvs = 1
Add your tags to your PVs:
root #pvchange--addtag @a /dev/sdaroot #pvchange--addtag @b /dev/sdb
A tag is an unordered keyword or term assigned to the metadata of a storage object. Tagging allows you to classify collections of LVM storage objects in ways that you find useful by attaching an unordered list of tags to their metadata.
List your tags:
root #pvs-o pv_name,vg_name,pv_tags /dev/sd{a,b}
You should receive this output:
PV VG PV Tags /dev/sda vgtest a /dev/sdb vgtest b
If you need further information regarding LVM, refer to our Storage Administration Guide at https://www.suse.com/documentation/sles11/stor_admin/data/lvm.html.
Preparing the cluster for use of cLVM includes the following basic steps:
Both cLVM and OCFS2 need a DLM resource that runs on all nodes in the cluster and therefore is usually configured as a clone. If you have a setup that includes both OCFS2 and cLVM, configuring one DLM resource for both OCFS2 and cLVM is enough.
Start a shell and log in as root.
Run crm .
configure
Check the current configuration of the cluster resources with
show.
If you have already configured a DLM resource (and a corresponding base group and base clone), continue with Procedure 16.2, “Creating LVM and cLVM Resources”.
Otherwise, configure a DLM resource and a corresponding base group and base clone as described in Procedure 13.2, “Configuring a DLM Resource”.
Leave the crm live configuration with exit.
Start a shell and log in as root.
Run crm .
configure
Configure a cLVM resource as follows:
crm(live)configure#primitiveclvm ocf:lvm2:clvmd \ params daemon_timeout="30"
Configure an LVM resource for the volume group as follows:
crm(live)configure#primitivevg1 ocf:heartbeat:LVM \ params volgrpname="cluster-vg" \ op monitor interval="60" timeout="60"
If you want the volume group to be activated exclusively on one node, configure the LVM resource as described below and omit Step 6:
crm(live)configure#primitivevg1 ocf:heartbeat:LVM \ params volgrpname="cluster-vg" exclusive="yes" \ op monitor interval="60" timeout="60"
In this case, cLVM will protect all logical volumes within the VG from being activated on multiple nodes, as an additional measure of protection for non-clustered applications.
To ensure that the cLVM and LVM resources are activated cluster-wide, add both primitives to the base group you have created in Procedure 13.2, “Configuring a DLM Resource”:
Enter
crm(live)configure#editbase-group
In the vi editor that opens, modify the group as follows and save your changes:
crm(live)configure#groupbase-group dlm clvm vg1 ocfs2-1
If your setup does not include OCFS2, omit the
ocfs2-1 primitive from the base group.
Review your changes with show.
If everything is correct, submit your changes with
commit and leave the crm live configuration with
exit.
The following scenario uses two SAN boxes which export their iSCSI targets to several clients. The general idea is displayed in Figure 16.1, “Setup of iSCSI with cLVM”.
The following procedures will destroy any data on your disks!
Configure only one SAN box first. Each SAN box has to export its own iSCSI target. Proceed as follows:
Run YaST and click › to start the iSCSI Server module.
If you want to start the iSCSI target whenever your computer is booted, choose , otherwise choose .
If you have a firewall running, enable .
Switch to the tab. If you need authentication enable incoming or outgoing authentication or both. In this example, we select .
Add a new iSCSI target:
Switch to the tab.
Click .
Enter a target name. The name has to be formatted like this:
iqn.DATE.DOMAIN
For more information about the format, refer to , Section 3.2.6.3.1. Type "iqn." (iSCSI Qualified Name) (http://www.ietf.org/rfc/rfc3720.txt).
If you want a more descriptive name, you can change it as long as your identifier is unique for your different targets.
Click .
Enter the device name in and use a .
Click twice.
Confirm the warning box with .
Open the configuration file /etc/iscsi/iscsi.conf
and change the parameter node.startup to
automatic.
Now set up your iSCSI initiators as follows:
Run YaST and click › .
If you want to start the iSCSI initiator whenever your computer is booted, choose , otherwise set .
Change to the tab and click the button.
Add your IP address and your port of your iSCSI target (see Procedure 16.3, “Configuring iSCSI Targets (SAN)”). Normally, you can leave the port as it is and use the default value.
If you use authentication, insert the incoming and outgoing username and password, otherwise activate .
Select . The found connections are displayed in the list.
Proceed with .
Open a shell, log in as root.
Test if the iSCSI initiator has been started successfully:
root #iscsiadm-m discovery -t st -p 192.168.3.100 192.168.3.100:3260,1 iqn.2010-03.de.jupiter:san1
Establish a session:
root #iscsiadm-m node -l Logging in to [iface: default, target: iqn.2010-03.de.jupiter:san2, portal: 192.168.3.100,3260] Logging in to [iface: default, target: iqn.2010-03.de.venus:san1, portal: 192.168.3.101,3260] Login to [iface: default, target: iqn.2010-03.de.jupiter:san2, portal: 192.168.3.100,3260]: successful Login to [iface: default, target: iqn.2010-03.de.venus:san1, portal: 192.168.3.101,3260]: successful
See the device names with lsscsi:
... [4:0:0:2] disk IET ... 0 /dev/sdd [5:0:0:1] disk IET ... 0 /dev/sde
Look for entries with IET in their third column. In
this case, the devices are /dev/sdd and
/dev/sde.
Open a root shell on one of the nodes you have run the iSCSI
initiator from
Procedure 16.4, “Configuring iSCSI Initiators”.
Prepare the physical volume for LVM with the command
pvcreate on the disks /dev/sdd
and /dev/sde:
root #pvcreate/dev/sddroot #pvcreate/dev/sde
Create the cluster-aware volume group on both disks:
root #vgcreate--clustered y clustervg /dev/sdd /dev/sde
Create logical volumes as needed:
root #lvcreate--name clusterlv --size 500M clustervg
Check the physical volume with pvdisplay:
--- Physical volume ---
PV Name /dev/sdd
VG Name clustervg
PV Size 509,88 MB / not usable 1,88 MB
Allocatable yes
PE Size (KByte) 4096
Total PE 127
Free PE 127
Allocated PE 0
PV UUID 52okH4-nv3z-2AUL-GhAN-8DAZ-GMtU-Xrn9Kh
--- Physical volume ---
PV Name /dev/sde
VG Name clustervg
PV Size 509,84 MB / not usable 1,84 MB
Allocatable yes
PE Size (KByte) 4096
Total PE 127
Free PE 127
Allocated PE 0
PV UUID Ouj3Xm-AI58-lxB1-mWm2-xn51-agM2-0UuHFC
Check the volume group with vgdisplay:
--- Volume group ---
VG Name clustervg
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 1
VG Access read/write
VG Status resizable
Clustered yes
Shared no
MAX LV 0
Cur LV 0
Open LV 0
Max PV 0
Cur PV 2
Act PV 2
VG Size 1016,00 MB
PE Size 4,00 MB
Total PE 254
Alloc PE / Size 0 / 0
Free PE / Size 254 / 1016,00 MB
VG UUID UCyWw8-2jqV-enuT-KH4d-NXQI-JhH3-J24anD
After you have created the volumes and started your resources you should
have a new device named /dev/dm-0.
It is recommended to use a clustered file system on top of your LVM
resource, for example OCFS. For more information, see
Chapter 13, OCFS2
The following scenarios can be used if you have data centers located in different parts of your city, country, or continent.
Create a primary/primary DRBD resource:
First, set up a DRBD device as primary/secondary as described in
Procedure 15.1, “Manually Configuring DRBD”. Make sure the disk state is
up-to-date on both nodes. Check this with
cat /proc/drbd or with
systemctl status drbd.service.
Add the following options to your configuration file (usually
something like /etc/drbd.d/r0.res):
resource r0 {
startup {
become-primary-on both;
}
net {
allow-two-primaries;
}
...
}Copy the changed configuration file to the other node, for example:
root #scp/etc/drbd.d/r0.res venus:/etc/drbd.d/
Run the following commands on both nodes:
root #drbdadmdisconnect r0root #drbdadmconnect r0root #drbdadmprimary r0
Check the status of your nodes:
root #cat/proc/drbd ... 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----
Include the clvmd resource as a clone in the pacemaker configuration,
and make it depend on the DLM clone resource. See
Procedure 16.1, “Creating a DLM Resource” for detailed instructions.
Before proceeding, confirm that these resources have started
successfully on your cluster. You may use crm_mon
or the Web interface to check the running services.
Prepare the physical volume for LVM with the command
pvcreate. For example, on the device
/dev/drbd_r0 the command would look like this:
root #pvcreate/dev/drbd_r0
Create a cluster-aware volume group:
root #vgcreate--clustered y myclusterfs /dev/drbd_r0
Create logical volumes as needed. You may probably want to change the size of the logical volume. For example, create a 4 GB logical volume with the following command:
root #lvcreate--name testlv -L 4G myclusterfs
The logical volumes within the VG are now available as file system mounts or raw usage. Ensure that services using them have proper dependencies to collocate them with and order them after the VG has been activated.
After finishing these configuration steps, the LVM2 configuration can be done just like on any standalone workstation.
When several devices seemingly share the same physical volume signature (as can be the case for multipath devices or DRBD), it is recommended to explicitly configure the devices which LVM2 scans for PVs.
For example, if the command vgcreate uses the physical
device instead of using the mirrored block device, DRBD will be confused
which may result in a split brain condition for DRBD.
To deactivate a single device for LVM2, do the following:
Edit the file /etc/lvm/lvm.conf and search for the
line starting with filter.
The patterns there are handled as regular expressions. A leading “a” means to accept a device pattern to the scan, a leading “r” rejects the devices that follow the device pattern.
To remove a device named /dev/sdb1, add the
following expression to the filter rule:
"r|^/dev/sdb1$|"
The complete filter line will look like the following:
filter = [ "r|^/dev/sdb1$|", "r|/dev/.*/by-path/.*|", "r|/dev/.*/by-id/.*|", "a/.*/" ]
A filter line, that accepts DRBD and MPIO devices but rejects all other devices would look like this:
filter = [ "a|/dev/drbd.*|", "a|/dev/.*/by-id/dm-uuid-mpath-.*|", "r/.*/" ]
Write the configuration file and copy it to all cluster nodes.
Thorough information is available from the pacemaker mailing list, available at http://www.clusterlabs.org/wiki/Help:Contents.
The official cLVM FAQ can be found at http://sources.redhat.com/cluster/wiki/FAQ/CLVM.