The distributed replicated block device (DRBD*) allows you to create a mirror of two block devices that are located at two different sites across an IP network. When used with Corosync, DRBD supports distributed high-availability Linux clusters. This chapter shows you how to install and set up DRBD.
DRBD replicates data on the primary device to the secondary device in a way that ensures that both copies of the data remain identical. Think of it as a networked RAID 1. It mirrors data in real-time, so its replication occurs continuously. Applications do not need to know that in fact their data is stored on different disks.
The data traffic between mirrors is not encrypted. For secure data exchange, you should deploy a Virtual Private Network (VPN) solution for the connection.
DRBD is a Linux Kernel module and sits between the I/O scheduler at the
lower end and the file system at the upper end, see
Figure 15.1, “Position of DRBD within Linux”. To communicate with DRBD,
users use the high-level command drbdadm. For maximum
flexibility DRBD comes with the low-level tool
drbdsetup.
DRBD allows you to use any block device supported by Linux, usually:
partition or complete hard disk
software RAID
Logical Volume Manager (LVM)
Enterprise Volume Management System (EVMS)
By default, DRBD uses the TCP ports 7788 and higher
for communication between DRBD nodes. Make sure that your firewall does
not prevent communication on the used ports.
You must set up the DRBD devices before creating file systems on them.
Everything pertaining to user data should be done solely via the
/dev/drbd_N device and
not on the raw device, as DRBD uses the last part of the raw device for
metadata. Using the raw device will cause inconsistent data.
With udev integration, you will also get symlinks in the form
/dev/drbd/by-res/RESOURCES
which are easier to use and provide safety against misremembering the
devices' minor number.
For example, if the raw device is 1024 MB in size, the DRBD device has
only 1023 MB available for data, with about 70 MB hidden and reserved for the
metadata. Any attempt to access the remaining kilobytes via
/dev/drbdN
fails because it is not available for user data.
Install the High Availability Extension Add-On product on both SUSE Linux Enterprise Server machines in your networked cluster as described in Part I, “Installation and Setup”. Installing High Availability Extension also installs the DRBD program files.
If you do not need the complete cluster stack but just want to use DRBD,
install the package drbd.
To simplify the work with drbdadm (one part of the
drbd package), use the Bash completion support.
If you want to enable it in your current shell session, insert the
following command:
root #source/etc/bash_completion.d/drbdadm.sh
To use it permanently for root, create, or extend a file
/root/.bashrc and insert the previous line.
The following procedure uses the server names jupiter and venus, and the
cluster resource name r0. It sets up jupiter as the primary node and
/dev/sda1 for storage. Make sure to modify the
instructions to use your own nodes and filenames.
Before you start configuring DRBD, make sure the block devices in your
Linux nodes are ready and partitioned (if needed). The following
procedure assumes you have two nodes, jupiter and venus, and that they should use the
TCP port 7788. Make sure this port is open in your
firewall.
To set up DRBD manually, proceed as follows:
Put your cluster in maintenance mode, if the cluster is already using DRBD:
root #crmconfigure edit node alice attributes maintenance="true"
If you skip this step when your cluster uses already DRBD, a syntax error in the live configuration will lead to a service shutdown.
Log in as user root.
Change DRBD's configuration files:
Open the file /etc/drbd.conf and insert the
following lines, if they do not exist yet:
include "drbd.d/global_common.conf"; include "drbd.d/*.res";
Beginning with DRBD 8.3 the configuration file is split into separate
files, located under the directory /etc/drbd.d/.
Open the file /etc/drbd.d/global_common.conf. It
contains already some pre-defined values. Go to the
startup section and insert these lines:
startup {
# wfc-timeout degr-wfc-timeout outdated-wfc-timeout
# wait-after-sb;
wfc-timeout 100;
degr-wfc-timeout 120;
}These options are used to reduce the timeouts when booting, see http://www.drbd.org/users-guide-emb/re-drbdconf.html for more details.
Create the file /etc/drbd.d/r0.res, change the
lines according to your situation, and save it:
resource r0 { 1
device /dev/drbd0; 2
disk /dev/sda1; 3
meta-disk internal; 4
on jupiter { 5
address 192.168.1.10:7788; 6
}
on venus { 5
address 192.168.1.11:7788; 6
}
syncer {
rate 7M; 7
}
}
Name that allows some association to the service that needs them.
For example, | |
The device name for DRBD and its minor number.
In the example above, the minor nummer 0 is used for DRBD.
The udev integration scripts will give you a symlink
| |
The raw device that is replicated between nodes. Note, in this example
the devices are the same on both nodes. If you need different
devices, move the | |
The meta-disk parameter usually contains the value
| |
The | |
The IP address and port number of the respective node. Each
resource needs an individual port, usually starting with
| |
The synchronization rate. Set it to one third of the lower of the disk- and network bandwith. It only limits the resynchronization, not the replication. |
Check the syntax of your configuration file(s). If the following command returns an error, verify your files:
root #drbdadmdump all
If you have configured Csync2 (which should be the default), the DRBD configuration files are already included in the list of files which need to be synchronized. To synchronize them, use:
root #csync2-xv /etc/drbd.d/
If you do not have Csync2 (or do not want to use it), copy the DRBD configuration files manually to the other node:
root #scp/etc/drbd.conf venus:/etc/ scp /etc/drbd.d/* venus:/etc/drbd.d/
Initialize the meta data on both systems by entering the following on each node:
root #drbdadmcreate-md r0root #systemctlstart drbd.service
If your disk already contains a file system that you do not need anymore, destroy the file system structure with the following command and repeat this step:
root #ddif=/dev/zero of=/dev/sda1 count=16 bs=1M
Watch the DRBD status by entering the following on each node:
root #systemctlstatus drbd.service
You should get something like this:
[... version string omitted ...] m:res cs ro ds p mounted fstype 0:r0 Connected Secondary/Secondary Inconsistent/Inconsistent C
Start the resync process on your intended primary node (jupiter in this case):
root #drbdadm-- --overwrite-data-of-peer primary r0
Check the status again with systemctl status drbd.service and after
resyncronization, you get:
... m:res cs ro ds p mounted fstype 0:r0 Connected Primary/Secondary UpToDate/UpToDate C
The status in the ds row (disk status) must be
UpToDate on both nodes.
Create your file system on top of your DRBD device, for example:
root #mkfs.ext3/dev/drbd/by-res/r0/0
Mount the file system and use it:
root #mount/dev/drbd /mnt/
Reset the cluster's maintenance mode flag:
root #crmconfigure edit node alice attributes maintenance="false"
Alternatively, to use YaST to configure DRBD, proceed as follows:
Start YaST and select the configuration module › . If you already have a DRBD configuration, YaST warns
you. YaST will change your configuration and will save your old DRBD
configuration files as *.YaSTsave.
Leave the booting flag in › as it is
(by default it is off);
do not change that as Pacemaker manage this service.
The actual configuration of the resource is done in (see Figure 15.2, “Resource Configuration”).
Press to create a new resource. The following parameters have to be set twice:
|
|
The name of the resource (mandatory) |
|
|
The hostname of the relevant node |
|
|
The IP address and port number (default 7788) for the respective node |
|
|
The block device path that is used to access the replicated data. |
|
|
The device that is replicated between both nodes. |
|
|
The is either set to the value
A real device may also be used for multiple drbd resources. For
example, if your is
|
All of these options are explained in the examples in the
/usr/share/doc/packages/drbd/drbd.conf file
and in the man page of drbd.conf(5).
If you have configured Csync2 (which should be the default), the DRBD configuration files are already included in the list of files which need to be synchronized. To synchronize them, use:
root #csync2-xv /etc/drbd.d/
If you do not have Csync2 (or do not want to use it), copy the DRBD configuration files manually to the other node (here, another node with the name venus):
root #scp/etc/drbd.conf venus:/etc/ scp /etc/drbd.d/* venus:/etc/drbd.d/
Initialize and start the DRBD service on both systems by entering the following on each node:
root #drbdadmcreate-md r0root #systemctlstart drbd.service
Configure alice as the primary node by entering
the following on alice:
root #drbdsetup/dev/drbd0 primary --overwrite-data-of-peer
Check the DRBD service status by entering the following on each node:
root #systemctlstatus drbd.service
Before proceeding, wait until the block devices on both nodes are fully
synchronized. Repeat the systemctl status drbd.service command to
follow the synchronization progress.
After the block devices on both nodes are fully synchronized, format
the DRBD device on the primary with your preferred file system. Any
Linux file system can be used.
It is recommended to use the
/dev/drbd/by-res/RESOURCE
name.
If the install and configuration procedures worked as expected, you are ready to run a basic test of the DRBD functionality. This test also helps with understanding how the software works.
Test the DRBD service on jupiter.
Open a terminal console, then log in as
root.
Create a mount point on jupiter, such as
/srv/r0:
root #mkdir-p /srv/r0
Mount the drbd device:
root #mount-o rw /dev/drbd0 /srv/r0
Create a file from the primary node:
root #touch/srv/r0/from_jupiter
Unmount the disk on jupiter:
root #umount/srv/r0
Downgrade the DRBD service on jupiter by typing the following command on jupiter:
root #drbdadmsecondary
Test the DRBD service on venus.
Open a terminal console, then log in as
root on venus.
On venus, promote the DRBD service to primary:
root #drbdadmprimary
On venus, check to see if venus is primary:
root #systemctlstatus drbd.service
On venus, create a mount point such as
/srv/r0mount:
root #mkdir/srv/r0mount
On venus, mount the DRBD device:
root #mount-o rw /dev/drbd_r0 /srv/r0mount
Verify that the file you created on jupiter exists:
root #ls/srv/r0
The /srv/r0mount/from_jupiter file should be
listed.
If the service is working on both nodes, the DRBD setup is complete.
Set up jupiter as the primary again.
Dismount the disk on venus by typing the following command on venus:
root #umount/srv/r0
Downgrade the DRBD service on venus by typing the following command on venus:
root #drbdadmsecondary
On jupiter, promote the DRBD service to primary:
root #drbdadmprimary
On jupiter, check to see if jupiter is primary:
root #systemctlstatus drbd.service
To get the service to automatically start and fail over if the server has a problem, you can set up DRBD as a high availability service with Pacemaker/Corosync. For information about installing and configuring for SUSE Linux Enterprise 12 see Part II, “Configuration and Administration”.
There are several ways to tune DRBD:
Use an external disk for your metadata. This might help a small bit, at the cost of maintenance ease.
Create a udev rule to change the read-ahead of the DRBD device. Save
the following line in the file
/etc/udev/rules.d/82-dm-ra.rules and change the
read_ahead_kb value to your workload:
ACTION=="add", KERNEL=="dm-*", ATTR{bdi/read_ahead_kb}="4100"This line only works if you use LVM.
Tune your network connection, by changing the recive and send
buffer settings via sysctl.
Change the max-buffers,
max-epoch-size or both in the DRBD
configuration.
Increase the al-extents value, depending
on your IO patterns.
If you have a hardware RAID controller with a BBU (Battery Backup Unit),
you might benefit from setting no-disk-flushes,
no-disk-barrier and/or no-md-flushes.
Enable read-balancing depending on your workload. See http://blogs.linbit.com/p/256/read-balancing/ for more details.
The DRBD setup involves many different components and problems may arise from different sources. The following sections cover several common scenarios and recommend various solutions.
If the initial DRBD setup does not work as expected, there is probably something wrong with your configuration.
To get information about the configuration:
Open a terminal console, then log in as root.
Test the configuration file by running drbdadm with
the -d option. Enter the following command:
root #drbdadm-d adjust r0
In a dry run of the adjust option,
drbdadm compares the actual configuration of the
DRBD resource with your DRBD configuration file, but it does not
execute the calls. Review the output to make sure you know the source
and cause of any errors.
If there are errors in the /etc/drbd.d/* and
drbd.conf files, correct them before continuing.
If the partitions and settings are correct, run
drbdadm again without the -d
option.
root #drbdadmadjust r0
This applies the configuration file to the DRBD resource.
For DRBD, hostnames are case sensitive (Node0
would be a different host than node0),
and compared to the hostname as stored in the Kernel (see the
uname -n output).
If you have several network devices and want to use a dedicated network
device, the hostname will likely not resolve to the used IP address. In
this case, use the parameter disable-ip-verification.
If your system is unable to connect to the peer, this might be a problem
with your local firewall. By default, DRBD uses the TCP port
7788 to access the other node. Make sure that this
port is accessible on both nodes.
In cases when DRBD does not know which of the real devices holds the latest data, it changes to a split brain condition. In this case, the respective DRBD subsystems come up as secondary and do not connect to each other. In this case, the following message can be found in the logging data:
Split-Brain detected, dropping connection!
To resolve this situation, enter the following on the node which has data to be discarded:
root #drbdadm secondary r0root #drbdadm-- --discard-my-data connect r0
On the node which has the latest data enter the following:
root #drbdadmconnect r0
That resolves the issue by overwriting one nodes' data with the peer's data, therefor getting a consistent view on both nodes.
The following open source resources are available for DRBD:
The project home page http://www.drbd.org.
See Highly Available NFS Storage with DRBD and Pacemaker.
http://clusterlabs.org/wiki/DRBD_HowTo_1.0 by the Linux Pacemaker Cluster Stack Project.
The following man pages for DRBD are available in the distribution:
drbd(8), drbddisk(8),
drbdsetup(8), drbdsetup(8),
drbdadm(8), drbd.conf(5).
Find a commented example configuration for DRBD at
/usr/share/doc/packages/drbd/drbd.conf
Furthermore, for easier storage administration across your cluster, see the recent announcement about the DRBD-Manager at http://blogs.linbit.com/p/666/drbd-manager.