Applies to SUSE Linux Enterprise High Availability Extension 12

13 OCFS2

Oracle Cluster File System 2 (OCFS2) is a general-purpose journaling file system that has been fully integrated since the Linux 2.6 Kernel. OCFS2 allows you to store application binary files, data files, and databases on devices on shared storage. All nodes in a cluster have concurrent read and write access to the file system. A user-space control daemon, managed via a clone resource, provides the integration with the HA stack, in particular with Corosync and the Distributed Lock Manager (DLM).

13.1 Features and Benefits

OCFS2 can be used for the following storage solutions for example:

  • General applications and workloads.

  • Xen image store in a cluster. Xen virtual machines and virtual servers can be stored on OCFS2 volumes that are mounted by cluster servers. This provides quick and easy portability of Xen virtual machines between servers.

  • LAMP (Linux, Apache, MySQL, and PHP | Perl | Python) stacks.

As a high-performance, symmetric and parallel cluster file system, OCFS2 supports the following functions:

  • An application's files are available to all nodes in the cluster. Users simply install it once on an OCFS2 volume in the cluster.

  • All nodes can concurrently read and write directly to storage via the standard file system interface, enabling easy management of applications that run across the cluster.

  • File access is coordinated through DLM. DLM control is good for most cases, but an application's design might limit scalability if it contends with the DLM to coordinate file access.

  • Storage backup functionality is available on all back-end storage. An image of the shared application files can be easily created, which can help provide effective disaster recovery.

OCFS2 also provides the following capabilities:

  • Metadata caching.

  • Metadata journaling.

  • Cross-node file data consistency.

  • Support for multiple-block sizes up to 4 KB, cluster sizes up to 1 MB, for a maximum volume size of 4 PB (Petabyte).

  • Support for up to 32 cluster nodes.

  • Asynchronous and direct I/O support for database files for improved database performance.

13.2 OCFS2 Packages and Management Utilities

The OCFS2 Kernel module (ocfs2) is installed automatically in the High Availability Extension on SUSE® Linux Enterprise Server 12. To use OCFS2, make sure the following packages are installed on each node in the cluster: ocfs2-tools and the matching ocfs2-kmp-* packages for your Kernel.

The ocfs2-tools package provides the following utilities for management of OFS2 volumes. For syntax information, see their man pages.

Table 13.1: OCFS2 Utilities

OCFS2 Utility

Description

debugfs.ocfs2

Examines the state of the OCFS file system for the purpose of debugging.

fsck.ocfs2

Checks the file system for errors and optionally repairs errors.

mkfs.ocfs2

Creates an OCFS2 file system on a device, usually a partition on a shared physical or logical disk.

mounted.ocfs2

Detects and lists all OCFS2 volumes on a clustered system. Detects and lists all nodes on the system that have mounted an OCFS2 device or lists all OCFS2 devices.

tunefs.ocfs2

Changes OCFS2 file system parameters, including the volume label, number of node slots, journal size for all node slots, and volume size.

13.3 Configuring OCFS2 Services and a STONITH Resource

Before you can create OCFS2 volumes, you must configure the following resources as services in the cluster: DLM, and a STONITH resource. OCFS2 uses the cluster membership services from Pacemaker which run in user space. Therefore, DLM needs to be configured as clone resource that is present on each node in the cluster.

The following procedure uses the crm shell to configure the cluster resources. Alternatively, you can also use Hawk to configure the resources as described in Section 13.6, “Configuring OCFS2 Resources With Hawk”.

Note
Note: DLM Resource for Both cLVM and OCFS2

Both cLVM and OCFS2 need a DLM resource that runs on all nodes in the cluster and therefore usually is configured as a clone. If you have a setup that includes both OCFS2 and cLVM, configuring one DLM resource for both OCFS2 and cLVM is enough.

Procedure 13.1: Configuring a STONITH Resource
Note
Note: STONITH Device Needed

You need to configure a fencing device. Without a STONITH mechanism (like external/sbd) in place the configuration will fail.

  1. Start a shell and log in as root or equivalent.

  2. Create an SBD partition as described in Section 17.1.3.1, “Creating the SBD Partition”.

  3. Run crm configure.

  4. Configure external/sbd as fencing device with /dev/sdb2 being a dedicated partition on the shared storage for heartbeating and fencing:

    crm(live)configure# primitive sbd_stonith stonith:external/sbd \
          meta target-role="Started"
  5. Review your changes with show.

  6. If everything is correct, submit your changes with commit and leave the crm live configuration with exit.

Procedure 13.2: Configuring a DLM Resource

The configuration consists of a base group that includes several primitives and a base clone. Both base group and base clone can be used in various scenarios afterwards (for both OCFS2 and cLVM, for example). You only need to extended the base group with the respective primitives as needed. As the base group has internal colocation and ordering, this facilitates the overall setup as you do not have to specify several individual groups, clones and their dependencies.

Follow the steps below for one node in the cluster:

  1. Start a shell and log in as root or equivalent.

  2. Run crm configure.

  3. Enter the following to create the primitive resource for DLM:

    crm(live)configure# primitive dlm ocf:pacemaker:controld \
          op monitor interval="60" timeout="60"
  4. Create a base-group for the DLM resource. As further cloned primitives are created, it will be added to this group.

    crm(live)configure# group base-group dlm
  5. Clone the base-group so that it runs on all nodes.

    crm(live)configure#  clone base-clone base-group \
    	    	meta interleave=true target-role=Started
  6. Review your changes with show.

  7. If everything is correct, submit your changes with commit and leave the crm live configuration with exit.

13.4 Creating OCFS2 Volumes

After you have configured a DLM cluster resource as described in Section 13.3, “Configuring OCFS2 Services and a STONITH Resource”, configure your system to use OCFS2 and create OCFs2 volumes.

Note
Note: OCFS2 Volumes for Application and Data Files

We recommend that you generally store application files and data files on different OCFS2 volumes. If your application volumes and data volumes have different requirements for mounting, it is mandatory to store them on different volumes.

Before you begin, prepare the block devices you plan to use for your OCFS2 volumes. Leave the devices as free space.

Then create and format the OCFS2 volume with the mkfs.ocfs2 as described in Procedure 13.3, “Creating and Formatting an OCFS2 Volume”. The most important parameters for the command are listed in Table 13.2, “Important OCFS2 Parameters”. For more information and the command syntax, refer to the mkfs.ocfs2 man page.

Table 13.2: Important OCFS2 Parameters

OCFS2 Parameter

Description and Recommendation

Volume Label (-L)

A descriptive name for the volume to make it uniquely identifiable when it is mounted on different nodes. Use the tunefs.ocfs2 utility to modify the label as needed.

Cluster Size (-C)

Cluster size is the smallest unit of space allocated to a file to hold the data. For the available options and recommendations, refer to the mkfs.ocfs2 man page.

Number of Node Slots (-N)

The maximum number of nodes that can concurrently mount a volume. For each of the nodes, OCFS2 creates separate system files, such as the journals, for each of the nodes. Nodes that access the volume can be a combination of little-endian architectures (such as x86_64) and big-endian architectures (such as s390x).

Node-specific files are referred to as local files. A node slot number is appended to the local file. For example: journal:0000 belongs to whatever node is assigned to slot number 0.

Set each volume's maximum number of node slots when you create it, according to how many nodes that you expect to concurrently mount the volume. Use the tunefs.ocfs2 utility to increase the number of node slots as needed. Note that the value cannot be decreased.

In case the -N parameter is not specified, the number of slots is decided based on the size of the file system.

Block Size (-b)

The smallest unit of space addressable by the file system. Specify the block size when you create the volume. For the available options and recommendations, refer to the mkfs.ocfs2 man page.

Specific Features On/Off (--fs-features)

A comma separated list of feature flags can be provided, and mkfs.ocfs2 will try to create the file system with those features set according to the list. To turn a feature on, include it in the list. To turn a feature off, prepend no to the name.

For on overview of all available flags, refer to the mkfs.ocfs2 man page.

Pre-Defined Features (--fs-feature-level)

Allows you to choose from a set of pre-determined file system features. For the available options, refer to the mkfs.ocfs2 man page.

If you do not specify any specific features when creating and formatting the volume with mkfs.ocfs2, the following features are enabled by default: backup-super, sparse, inline-data, unwritten, metaecc, indexed-dirs, and xattr.

Procedure 13.3: Creating and Formatting an OCFS2 Volume

Execute the following steps only on one of the cluster nodes.

  1. Open a terminal window and log in as root.

  2. Check if the cluster is online with the command crm status.

  3. Create and format the volume using the mkfs.ocfs2 utility. For information about the syntax for this command, refer to the mkfs.ocfs2 man page.

    For example, to create a new OCFS2 file system on /dev/sdb1 that supports up to 32 cluster nodes, enter the following commands:

    root #  mkfs.ocfs2 -N 32 /dev/sdb1

13.5 Mounting OCFS2 Volumes

You can either mount an OCFS2 volume manually or with the cluster manager, as described in Procedure 13.5, “Mounting an OCFS2 Volume with the Cluster Resource Manager”.

Procedure 13.4: Manually Mounting an OCFS2 Volume
  1. Open a terminal window and log in as root.

  2. Check if the cluster is online with the command crm status.

  3. Mount the volume from the command line, using the mount command.

Warning
Warning: Manually Mounted OCFS2 Devices

If you mount the OCFS2 file system manually for testing purposes, make sure to unmount it again before starting to use it by means of cluster resources.

Procedure 13.5: Mounting an OCFS2 Volume with the Cluster Resource Manager

To mount an OCFS2 volume with the High Availability software, configure an ocfs2 file system resource in the cluster. The following procedure uses the crm shell to configure the cluster resources. Alternatively, you can also use Hawk to configure the resources as described in Section 13.6, “Configuring OCFS2 Resources With Hawk”.

  1. Start a shell and log in as root or equivalent.

  2. Run crm configure.

  3. Configure Pacemaker to mount the OCFS2 file system on every node in the cluster:

    crm(live)configure# primitive ocfs2-1 ocf:heartbeat:Filesystem \
          params device="/dev/sdb1" directory="/mnt/shared" fstype="ocfs2" options="acl" \
          op monitor interval="20" timeout="40"
  4. Add the ocfs2-1 and dlm primitive to the base-group you created in Procedure 13.2, “Configuring a DLM Resource”

    crm(live)configure# group base-group dlm ocfs2-1
         

    Due to the base group's internal colocation and ordering, Pacemaker will only start the ocfs2-1 resource on nodes that also have an dlm resource already running.

  5. Review your changes with show.

  6. If everything is correct, submit your changes with commit and leave the crm live configuration with exit.

13.6 Configuring OCFS2 Resources With Hawk

Instead of configuring the DLM and the file system resource for OCFS2 manually with the crm shell, you can also use the OCFS2 template in Hawk's Setup Wizard.

Important
Important: Differences Between Manual Configuration and Hawk

The OCFS2 template in the Setup Wizard does not include the configuration a STONITH resource. If you use the wizard, you still need to create an SBD partition on the shared storage and configure a STONITH resource as described in Procedure 13.1, “Configuring a STONITH Resource”.

Using the OCFS2 template in the Hawk Setup Wizard also leads to a slightly different resource configuration than the manual configuration described in Procedure 13.2, “Configuring a DLM Resource” and Procedure 13.5, “Mounting an OCFS2 Volume with the Cluster Resource Manager”.

Procedure 13.6: Configuring OCFS2 Resources with Hawk's Setup Wizard
  1. Start a Web browser and log in to the cluster as described in Section 5.1.1, “Starting Hawk and Logging In”.

  2. In the left navigation bar, select Setup Wizard.

  3. Select the OCFS2 Filesystem template and click Next.

    Hawk proposes values for the following parameters:

    • Resource ID

    • Block Device

    • File System Type

    If you need information about an option, click it to display a short help text in Hawk.

  4. Complete the information by entering the path to the Block Device for your file system and by entering additional Mount Options, if necessary.

    Hawk Setup Wizard—OCFS2 Template
    Figure 13.1: Hawk Setup Wizard—OCFS2 Template
  5. Click Next.

    The wizard displays the configuration snippet that will be applied to the CIB.

  6. To apply it, click Next.

    A message on the screen shows if the action has been successful. If everything is according to your wishes, leave the wizard.

13.7 Using Quotas on OCFS2 File Systems

To use quotas on an OCFS2 file system, create and mount the files system with the appropriate quota features or mount options, respectively: ursquota (quota for individual users) or grpquota (quota for groups). These features can also be enabled later on an unmounted file system using tunefs.ocfs2.

When a file system has the appropriate quota feature enabled, it tracks in its metadata how much space and files each user (or group) uses. Since OCFS2 treats quota information as file system-internal metadata, you do not need to run the quotacheck(8) program. All functionality is built into fsck.ocfs2 and the file system driver itself.

To enable enforcement of limits imposed on each user or group, run quotaon(8) like you would do for any other file system.

For performance reasons each cluster node performs quota accounting locally and synchronizes this information with a common central storage once per 10 seconds. This interval is tunable with tunefs.ocfs2, options usrquota-sync-interval and grpquota-sync-interval. Therefore quota information may not be exact at all times and as a consequence users or groups can slightly exceed their quota limit when operating on several cluster nodes in parallel.

13.8 For More Information

For more information about OCFS2, see the following links:

http://oss.oracle.com/projects/ocfs2/

OCFS2 project home page at Oracle.

http://oss.oracle.com/projects/ocfs2/documentation

The project's documentation home page.

Print this page