SUSE Linux Enterprise Server ships with a number of different file systems from which to choose, including Btrfs, Ext4, Ext3, Ext2, ReiserFS and XFS. Each file system has its own advantages and disadvantages. For a side-by-side feature comparison of the major operating systems in SUSE Linux Enterprise Server, see http://www.suse.com/products/server/technical-information/#FileSystem (File System Support and Sizes).
Professional high-performance setups might require a highly available storage systems. To meet the requirements of high-performance clustering scenarios, SUSE Linux Enterprise Server includes OCFS2 (Oracle Cluster File System 2) and the Distributed Replicated Block Device (DRBD) in the High Availability Extension add-on. These advanced storage systems are not covered in this guide. For information, see the SUSE Linux Enterprise High Availability Extension Administration Guide at http://www.suse.com/doc.
With SUSE Linux Enterprise 12, Btrfs is the default file system for the operating system and XFS is the default for all other use cases. SUSE also continues to support the Ext family of file systems, ReiserFS and OCFS2. By default, the Btrfs file system will be set up with subvolumes. Snapshots will be automatically enabled for the root file system using the snapper infrastructure. For more information about snapper, refer to Chapter 4, System Recovery and Snapshot Management with Snapper, Administration Guide.
A data structure that is internal to the file system. It ensures that all of the on-disk data is properly organized and accessible. Essentially, it is “data about the data.” Almost every file system has its own structure of metadata, which is one reason why the file systems show different performance characteristics. It is extremely important to maintain metadata intact, because otherwise all data on the file system could become inaccessible.
A data structure on a file system that contains a variety of information about a file, including size, number of links, pointers to the disk blocks where the file contents are actually stored, and date and time of creation, modification, and access.
In the context of a file system, a journal is an on-disk structure containing a type of log in which the file system stores what it is about to change in the file system’s metadata. Journaling greatly reduces the recovery time of a file system because it has no need for the lengthy search process that checks the entire file system at system start-up. Instead, only the journal is replayed.
SUSE Linux Enterprise Server offers a variety of file systems from which to choose. This section contains an overview of how these file systems work and which advantages they offer.
It is very important to remember that no file system best suits all kinds of applications. Each file system has its particular strengths and weaknesses, which must be taken into account. In addition, even the most sophisticated file system cannot replace a reasonable backup strategy.
The terms data integrity and data consistency, when used in this section, do not refer to the consistency of the user space data (the data your application writes to its files). Whether this data is consistent must be controlled by the application itself.
SUSE Linux Enterprise Server 12 is set up using Btrfs and snapshot support for the root partition by default. See FIXME for details. Data partitions (such as /home residing on a separate partition) are formatted with XFS by default.
Unless stated otherwise in this section, all the steps required to set up or change partitions and file systems can be performed by using the YaST Partitioner (which is also strongly recommended). For information, see Chapter 15, Advanced Disk Setup, Deployment Guide.
Btrfs is a copy-on-write (COW) file system developed by Chris Mason. It is based on COW-friendly B-trees developed by Ohad Rodeh. Btrfs is a logging-style file system. Instead of journaling the block changes, it writes them in a new location, then links the change in. Until the last write, the new changes are not committed.
Btrfs provides fault tolerance, repair, and easy management features, such as the following:
Writable snapshots that allow you to easily roll back your system if needed after applying updates, or to back up files.
Subvolume support: Btrfs creates a default subvolume in its assigned pool of space. It allows you to create additional subvolumes that act as individual file systems within the same pool of space. The number of subvolumes is limited only by the space allocated to the pool.
The online check and repair functionality scrub is
available as part of the Btrfs command line tools. It verifies the
integrity of data and metadata, assuming the tree structure is
fine. You can run scrub periodically on a mounted file system; it runs
as a background process during normal operation.
Different RAID levels for metadata and user data.
Different checksums for metadata and user data to improve error detection.
Integration with Linux Logical Volume Manager (LVM) storage objects.
Integration with the YaST Partitioner and AutoYaST on SUSE Linux Enterprise Server. This also includes creating a Btrfs file system on Multiple Devices (MD) and Device Mapper (DM) storage configurations.
Offline migration from existing Ext2, Ext3, and Ext4 file systems.
Bootloader support for /boot, allowing to boot
from a Btrfs partition.
Multiple device support. This feature is currently not supported on SUSE Linux Enterprise Server.
Use Btrfs commands to set up transparent compression. Compression and Encryption functionality for Btrfs is currently under development and is currently not supported on SUSE Linux Enterprise Server.
By default, SUSE Linux Enterprise Server is set up using Btrfs and snapshots for the root partition. Snapshots allow you to easily roll back your system if needed after applying updates, or to back up files. Snapshots can easily be managed with the SUSE Snapper infrastructure as explained in Chapter 4, System Recovery and Snapshot Management with Snapper, Administration Guide. For general information about the SUSE Snapper project, see the Snapper Portal wiki at OpenSUSE.org (http://snapper.io).
When using a snapshot to roll back the system, it must be ensured that data such as user's home directories, Web and FTP server contents or log files do not get lost or overwritten during a roll back. This is achieved by using Btrfs subvolumes on the root file system. Subvolumes can be excluded from snapshots. The default root file system setup on SUSE Linux Enterprise Server as proposed by YaST during the installation contains the following subvolumes. They are excluded from snapshots for the reasons given below.
/boot/grub2/i386-pc ,/boot/grub2/x86_64-efi, /boot/grub2/powerpc-ieee1275, /boot/grub2/s390x-emu
A rollback of the boot loader configuration is not supported. The directories listed above are architecture-specific. The first two directories are present on x86_64 machines, the latter two on IBM POWER and on IBM System z, respectively.
/home
If /home does not reside on a separate partition,
it is excluded to avoid data loss on rollbacks.
/opt, /var/opt
Third-party products and add-ons usually get installed to
/opt. It is excluded to avoid uninstalling these
applications on rollbacks.
/srv
Contains data for Web and FTP servers. It is excluded to avoid data loss on rollbacks.
/tmp, /var/tmp,
/var/crash
All directories containing temporary files are excluded from snapshots.
/usr/localThis directory is used when manually installing software. It is excluded to avoid uninstalling these installations on rollbacks.
/var/lib/named
Contains zone data for the DNS server. Excluded from snapshots to ensure a name server can operate after a rollback.
/var/lib/mailman, /var/spool
Directories containing mail queues or mail are excluded to avoid a loss of mail after a rollback.
/var/lib/pgqsl
Contains PostgreSQL data.
/var/log
Log file location. Excluded from snapshots to allow log file analysis after the rollback of a broken system.
Rollbacks are only supported by the SUSE support if you do not remove any of the preconfigured subvolumes. You may, however, add additional subvolumes using the YaST Partitioner.
A system rollback from a snapshot on SUSE Linux Enterprise Server is performed by booting from the snapshot first. This allows you to check the snapshot while running before doing the rollback. Being able to boot from snapshots is achieved by mounting the subvolumes (which would normally not be necessary).
In addition to the subvolumes listed at Default Subvolume Setup for the Root Partition a volume named
@ exists. This is the default subvolume that will be
mounted as the root partition (/). The other
subvolumes will be mounted into this volume.
When booting from a snapshot, not the @ subvolume
will be used, but rather the snapshot. The parts of the file system
included in the snapshot will be mounted read-only as
/. The other subvolumes will be mounted writeable
into the snapshot. This state is temporary by default: the previous
configuration will be restored with the next reboot. To make it
permanent, execute the snapper rollback
command. This will make the snapshot that is currently booted the new
default subvolume, which will be used after a reboot.
You can migrate data volumes from existing Ext (Ext2, Ext3, or Ext4) or ReiserFS to the Btrfs file system. The conversion process occurs offline and in place on the device. The file system needs at least 15% of available free space on the device.
To convert the file system to Btrfs, take the file system offline, then enter:
sudo btrfs-convert <device>
To roll back the migration to the original file system, take the file system offline, then enter:
sudo btrfs-convert -r <device>
When rolling back to the original file system, all data will be lost that you added after the conversion to Btrfs. That is, only the original data is converted back to the previous file system.
Btrfs is integrated in the YaST Partitioner and AutoYaST. It is available during the installation to allow you to set up a solution for the root file system. You can use the YaST Partitioner after the installation to view and manage Btrfs volumes.
Btrfs administration tools are provided in the
btrfsprogs package. For information about using
Btrfs commands, see the man 8 btrfs, man 8
btrfsck, and man 8 mkfs.btrfs commands. For
information about Btrfs features, see the Btrfs
wiki at http://btrfs.wiki.kernel.org.
The Btrfs root file system subvolumes /var/log,
/var/crash and /var/cache can
use all of the available disk space during normal operation, and cause a
system malfunction. To help avoid this situation, SUSE Linux Enterprise Server now
offers Btrfs quota support for subvolumes. If you set up the root file
system by using the respective YaST proposal, it is prepared
accordingly: quota groups (qgroup) for all subvolumes
are already set up. To set a quota for a subvolume in the root file
system, proceed as follows:
Enable quota support:
sudo btrfs quota enable /
Get a list of subvolumes:
sudo btrfs subvolume list /
Quotas can only be set for existing subvolumes.
Set a quota for one of the subvolumes that was listed in the previous
step. A subvolume can either be identified by path (for example
/var/tmp) or by 0/subvolume
id (for example 0/272). The
following example sets a quota of five GB for
/var/tmp.
sudo btrfs qgroup limit 5G /var/tmp
The size can either be specified in bytes (5000000000), kilobytes (5000000K), megabytes (5000M), or gigabytes (5G). The resulting values in bytes slightly differ, since 1024 Bytes = 1 KiB, 1024 KiB = 1 MiB, etc.
To list the existing quotas, use the following command. The column
max_rfer shows the quota in bytes.
sudo btrfs qgroup show -r /
In case you want to nullify an existing quota, set a quota size of
0:
sudo btrfs qgroup limit 0 /var/tmp
To disable quota support for a partition and all its subvolumes, use
btrfs quota disable:
sudo btrfs quota disable /
See the man 8 btrfs-qgroup and man 8
btrfs-quota for more details. The
UseCases page on the Btrfs wiki (https://btrfs.wiki.kernel.org/index.php/UseCases) also provides
more information.
Btrfs supports data deduplication by replacing identical blocks in the
file system with logical links to a single copy of the block in a common
storage location. SUSE Linux Enterprise Server provides the tool
duperemove for scanning the file system for identical
blocks. When used on a Btrfs file system, it can also be used to
deduplicate these blocks. duperemove is not installed by default. To
make it available, install the package duperemove.
As of SUSE Linux Enterprise Server 12 duperemove is not suited to deduplicate the entire file system. It is intended to be used to deduplicate a set of 10 to 50 large files that possibly have lots of blocks in common, such as virtual machine images.
duperemove can either operate on a list of files or
recursively scan a directory:
sudo duperemove [options] file1 file2 file3 sudo duperemove -r [options] directory
It operates in two modes: read-only and de-duping. When run in read-only
mode (that is without the -d switch), it scans the
given files or directories for duplicated blocks and prints them
out. This works on any file system.
Running duperemove in de-duping mode is only supported
on Btrfs file systems. After having scanned the given files or
directories, the duplicated blocks will be submitted for deduplication.
For more information see man 8 duperemove.
Originally intended as the file system for their IRIX OS, SGI started XFS development in the early 1990s. The idea behind XFS was to create a high-performance 64-bit journaling file system to meet extreme computing challenges. XFS is very good at manipulating large files and performs well on high-end hardware. XFS is the default file system for data partitions in SUSE Linux Enterprise Server.
A quick review of XFS’s key features explains why it might prove to be a strong competitor for other journaling file systems in high-end computing.
At the creation time of an XFS file system, the block device underlying the file system is divided into eight or more linear regions of equal size. Those are referred to as allocation groups. Each allocation group manages its own inodes and free disk space. Practically, allocation groups can be seen as file systems in a file system. Because allocation groups are rather independent of each other, more than one of them can be addressed by the kernel simultaneously. This feature is the key to XFS’s great scalability. Naturally, the concept of independent allocation groups suits the needs of multiprocessor systems.
Free space and inodes are handled by B+ trees inside the allocation groups. The use of B+ trees greatly contributes to XFS’s performance and scalability. XFS uses delayed allocation, which handles allocation by breaking the process into two pieces. A pending transaction is stored in RAM and the appropriate amount of space is reserved. XFS still does not decide where exactly (in file system blocks) the data should be stored. This decision is delayed until the last possible moment. Some short-lived temporary data might never make its way to disk, because it is obsolete by the time XFS decides where actually to save it. In this way, XFS increases write performance and reduces file system fragmentation. Because delayed allocation results in less frequent write events than in other file systems, it is likely that data loss after a crash during a write is more severe.
Before writing the data to the file system, XFS reserves (preallocates) the free space needed for a file. Thus, file system fragmentation is greatly reduced. Performance is increased because the contents of a file are not distributed all over the file system.
Starting with version 12, SUSE Linux Enterprise Server supports the new “on-disk format” (v5) of the XFS file system. XFS file systems created by YaST will use this new format. The main advantages of this format are automatic checksums of all XFS metadata, file type support, and support for a larger number of access control lists for a file.
Note that this format is not supported by SUSE Linux Enterprise kernels older than version 3.12, by xfsprogs older than version 3.2.0, and GRUB 2 versions released before SUSE Linux Enterprise 12. This will be problematic if the file system should also be used from systems not meeting these prerequisites.
If you require interoperability of the XFS file system with older SUSE
systems or other Linux distributions, format the file system manually
using the mkfs.xfs command. This will create an XFS
file system in the old format (unless you use the -m
crc=1 option).
The origins of Ext2 go back to the early days of Linux history. Its predecessor, the Extended File System, was implemented in April 1992 and integrated in Linux 0.96c. The Extended File System underwent a number of modifications and, as Ext2, became the most popular Linux file system for years. With the creation of journaling file systems and their short recovery times, Ext2 became less important.
A brief summary of Ext2’s strengths might help understand why it was—and in some areas still is—the favorite Linux file system of many Linux users.
Being quite an “old-timer,” Ext2 underwent many
improvements and was heavily tested. This might be the reason why
people often refer to it as rock-solid. After a system outage when the
file system could not be cleanly unmounted, e2fsck starts to analyze
the file system data. Metadata is brought into a consistent state and
pending files or data blocks are written to a designated directory
(called lost+found). In contrast to journaling
file systems, e2fsck analyzes the entire file system and not only the
recently modified bits of metadata. This takes significantly longer
than checking the log data of a journaling file system. Depending on
file system size, this procedure can take half an hour or more.
Therefore, it is not desirable to choose Ext2 for any server that
needs high availability. However, because Ext2 does not maintain a
journal and uses significantly less memory, it is sometimes faster
than other file systems.
Because Ext3 is based on the Ext2 code and shares its on-disk format as well as its metadata format, upgrades from Ext2 to Ext3 are very easy.
Ext3 was designed by Stephen Tweedie. Unlike all other next-generation file systems, Ext3 does not follow a completely new design principle. It is based on Ext2. These two file systems are very closely related to each other. An Ext3 file system can be easily built on top of an Ext2 file system. The most important difference between Ext2 and Ext3 is that Ext3 supports journaling. In summary, Ext3 has three major advantages to offer:
The code for Ext2 is the strong foundation on which Ext3 could become a highly acclaimed next-generation file system. Its reliability and solidity are elegantly combined in Ext3 with the advantages of a journaling file system. Unlike transitions to other journaling file systems, such as ReiserFS or XFS, which can be quite tedious (making backups of the entire file system and re-creating it from scratch), a transition to Ext3 is a matter of minutes. It is also very safe, because re-creating an entire file system from scratch might not work flawlessly. Considering the number of existing Ext2 systems that await an upgrade to a journaling file system, you can easily see why Ext3 might be of some importance to many system administrators. Downgrading from Ext3 to Ext2 is as easy as the upgrade. Perform a clean unmount of the Ext3 file system and remount it as an Ext2 file system.
Some other journaling file systems follow the
“metadata-only” journaling approach. This means your
metadata is always kept in a consistent state, but this cannot be
automatically guaranteed for the file system data itself. Ext3 is
designed to take care of both metadata and data. The degree of
“care” can be customized. Enabling Ext3 in the
data=journal mode offers maximum security (data
integrity), but can slow down the system because both metadata and
data are journaled. A relatively new approach is to use the
data=ordered mode, which ensures both data and
metadata integrity, but uses journaling only for metadata. The file
system driver collects all data blocks that correspond to one metadata
update. These data blocks are written to disk before the metadata is
updated. As a result, consistency is achieved for metadata and data
without sacrificing performance. A third option to use is
data=writeback, which allows data to be written to
the main file system after its metadata has been committed to the
journal. This option is often considered the best in performance. It
can, however, allow old data to reappear in files after crash and
recovery while internal file system integrity is maintained. Ext3 uses
the data=ordered option as the default.
To convert an Ext2 file system to Ext3:
Create an Ext3 journal by running tune2fs -j as
the root user.
This creates an Ext3 journal with the default parameters.
To specify how large the journal should be and on which device it
should reside, run tune2fs
instead together with the desired journal options
-Jsize= and device=. More
information about the tune2fs program is
available in the tune2fs man page.
Edit the file /etc/fstab as the
root user to change the file system type
specified for the corresponding partition from
ext2 to ext3, then save the
changes.
This ensures that the Ext3 file system is recognized as such. The change takes effect after the next reboot.
To boot a root file system that is set up as an Ext3 partition,
add the modules ext3 and
jbd in the initrd. Do so by
adding the following line to
/etc/dracut.conf.d/01-dist.conf:
force_drivers+="ext3 jbd"
and running the dracut command.
-f
Reboot the system.
An inode stores information about the file and its block location in the file system. To allow space in the inode for extended attributes and ACLs, the default inode size for Ext3 was increased from 128 bytes on SLES 10 to 256 bytes on SLES 11. As compared to SLES 10, when you make a new Ext3 file system on SLES 11, the default amount of space preallocated for the same number of inodes is doubled, and the usable space for files in the file system is reduced by that amount. Thus, you must use larger partitions to accommodate the same number of inodes and files than were possible for an Ext3 file system on SLES 10.
When you create a new Ext3 file system, the space in the inode table is preallocated for the total number of inodes that can be created. The bytes-per-inode ratio and the size of the file system determine how many inodes are possible. When the file system is made, an inode is created for every bytes-per-inode bytes of space:
number of inodes = total size of the file system divided by the number of bytes per inode
The number of inodes controls the number of files you can have in the file system: one inode for each file. To address the increased inode size and reduced usable space available, the default for the bytes-per-inode ratio was increased from 8192 bytes on SLES 10 to 16384 bytes on SLES 11. The doubled ratio means that the number of files that can be created is one-half of the number of files possible for an Ext3 file system on SLES 10.
After the inodes are allocated, you cannot change the settings for the inode size or bytes-per-inode ratio. No new inodes are possible without re-creating the file system with different settings, or unless the file system gets extended. When you exceed the maximum number of inodes, no new files can be created on the file system until some files are deleted.
When you make a new Ext3 file system, you can specify the inode size
and bytes-per-inode ratio to control inode space usage and the number
of files possible on the file system. If the blocks size, inode size,
and bytes-per-inode ratio values are not specified, the default values
in the /etc/mked2fs.conf file are applied. For
information, see the mke2fs.conf(5) man page.
Use the following guidelines:
Inode size: The default inode size is 256 bytes. Specify a value in bytes that is a power of 2 and equal to 128 or larger in bytes and up to the block size, such as 128, 256, 512, and so on. Use 128 bytes only if you do not use extended attributes or ACLs on your Ext3 file systems.
Bytes-per-inode ratio: The default bytes-per-inode ratio is 16384 bytes. Valid bytes-per-inode ratio values must be a power of 2 equal to 1024 or greater in bytes, such as 1024, 2048, 4096, 8192, 16384, 32768, and so on. This value should not be smaller than the block size of the file system, because the block size is the smallest chunk of space used to store data. The default block size for the Ext3 file system is 4 KB.
In addition, you should consider the number of files and the size of files you need to store. For example, if your file system will have many small files, you can specify a smaller bytes-per-inode ratio, which increases the number of inodes. If your file system will have very large files, you can specify a larger bytes-per-inode ratio, which reduces the number of possible inodes.
Generally, it is better to have too many inodes than to run out of them. If you have too few inodes and very small files, you could reach the maximum number of files on a disk that is practically empty. If you have too many inodes and very large files, you might have free space reported but be unable to use it because you cannot create new files in space reserved for inodes.
If you do not use extended attributes or ACLs on your Ext3 file systems, you can restore the SLES 10 behavior specifying 128 bytes as the inode size and 8192 bytes as the bytes-per-inode ratio when you make the file system. Use any of the following methods to set the inode size and bytes-per-inode ratio:
Modifying the default settings for all new Ext3 files:
In a text editor, modify the defaults section of
the /etc/mke2fs.conf file to set the
inode_size and inode_ratio to
the desired default values. The values apply to all new Ext3 file
systems. For example:
blocksize = 4096 inode_size = 128 inode_ratio = 8192
At the command line:
Pass the inode size (-I 128) and the
bytes-per-inode ratio (-i 8192) to the
mkfs.ext3(8) command or the
mke2fs(8) command when you create a new Ext3
file system. For example, use either of the following commands:
sudo mkfs.ext3 -b 4096 -i 8092 -I 128 /dev/sda2 sudo mke2fs -t ext3 -b 4096 -i 8192 -I 128 /dev/sda2
During installation with YaST: Pass the inode size and bytes-per-inode ratio values when you create a new Ext3 file system during the installation. In the YaST Partitioner on the page under, select , then click . In the dialog box, select the desired values from the , , and drop-down lists.
For example, select 4096 for the drop-down list, select 8192 from the drop-down list, select 128 from the drop-down list, then click .
During installation with AutoYaST:
In an autoyast profile, you can use the fs_options
tag to set the opt_bytes_per_inode
ratio value of 8192 for -i and the
opt_inode_density value of 128 for -I:
<partitioning config:type="list">
<drive>
<device>/dev/sda</device>
<initialize config:type="boolean">true</initialize>
<partitions config:type="list">
<partition>
<filesystem config:type="symbol">ext3</filesystem>
<format config:type="boolean">true</format>
<fs_options>
<opt_bytes_per_inode>
<option_str>-i</option_str>
<option_value>8192</option_value>
</opt_bytes_per_inode>
<opt_inode_density>
<option_str>-I</option_str>
<option_value>128</option_value>
</opt_inode_density>
</fs_options>
<mount>/</mount>
<partition_id config:type="integer">131</partition_id>
<partition_type>primary</partition_type>
<size>25G</size>
</partition>
</partitions>
</drive>
<partitioning>For information, see http://www.suse.com/support/kb/doc.php?id=7009075 (SLES11 ext3 partitions can only store 50% of the files that can be stored on SLES10 [Technical Information Document 7009075]).
In 2006, Ext4 started as a fork from Ext3. It eliminates some storage limitations of Ext3 by supporting volumes with a size of up to 1 exbibyte, files with a size of up to 16 tebibytes and an unlimited number of subdirectories. It also introduces a number of performance enhancements such as delayed block allocation and a much faster file system checking routine. Ext4 is also more reliable by supporting journal checksums and by providing timestamps measured in nanoseconds. Ext4 is fully backwards compatible to Ext2 and Ext3—both file systems can be mounted as Ext4.
Officially one of the key features of the 2.4 kernel release, ReiserFS has been available as a kernel patch for 2.2.x SUSE kernels since version 6.4. ReiserFS was designed by Hans Reiser and the Namesys development team. It has proven itself to be a powerful alternative to Ext2. Its key assets are better disk space utilization, better disk access performance, faster crash recovery, and reliability through data journaling.
Existing ReiserFS partitions are supported for the lifetime of SUSE Linux Enterprise Server 12 specifically for migration purposes. Support for creating new ReiserFS file systems has been removed starting with SUSE Linux Enterprise Server 12.
In ReiserFS, all data is organized in a structure called a B*-balanced tree. The tree structure contributes to better disk space utilization because small files can be stored directly in the B* tree leaf nodes instead of being stored elsewhere and maintaining a pointer to the actual disk location. In addition to that, storage is not allocated in chunks of 1 or 4 KB, but in portions of the exact size needed. Another benefit lies in the dynamic allocation of inodes. This keeps the file system more flexible than traditional file systems, like Ext2, where the inode density must be specified at file system creation time.
For small files, file data and “stat_data” (inode) information are often stored next to each other. They can be read with a single disk I/O operation, meaning that only one access to disk is required to retrieve all the information needed.
Using a journal to keep track of recent metadata changes makes a file system check a matter of seconds, even for huge file systems.
ReiserFS also supports data journaling and ordered data modes similar
to the concepts outlined in
Section 1.2.4, “Ext3”.
The default mode is data=ordered, which ensures
both data and metadata integrity, but uses journaling only for
metadata.
Table 1.1, “File System Types in Linux” summarizes some other file systems supported by Linux. They are supported mainly to ensure compatibility and interchange of data with different kinds of media or foreign operating systems.
|
File System Type |
Description |
|---|---|
|
|
Compressed ROM file system: A compressed read-only file system for ROMs. |
|
|
High Performance File System: The IBM OS/2 standard file system. Only supported in read-only mode. |
|
|
Standard file system on CD-ROMs. |
|
|
This file system originated from academic projects on operating systems and was the first file system used in Linux. Today, it is used as a file system for floppy disks. |
|
|
|
|
|
File system for mounting Novell volumes over networks. |
|
|
Network File System: Here, data can be stored on any machine in a network and access might be granted via a network. |
|
|
Windows NT file system; read-only. |
|
|
Server Message Block is used by products such as Windows to enable file access over a network. |
|
|
Used on SCO UNIX, Xenix, and Coherent (commercial UNIX systems for PCs). |
|
|
Used by BSD, SunOS, and NextStep. Only supported in read-only mode. |
|
|
UNIX on MS-DOS: Applied on top of a standard
|
|
|
Virtual FAT: Extension of the |
Originally, Linux supported a maximum file size of 2 GiB (231 bytes). Unless a file system comes with large file support, the maximum file size on a 32-bit system is 2 GiB.
Currently, all of our standard file systems have LFS (large file support), which gives a maximum file size of 263 bytes in theory. Table 1.2, “Maximum Sizes of Files and File Systems (On-Disk Format, 4 KiB Block Size)” offers an overview of the current on-disk format limitations of Linux files and file systems. The numbers in the table assume that the file systems are using 4 KiB block size, which is a common standard. When using different block sizes, the results are different. The maximum file sizes in Table 1.2, “Maximum Sizes of Files and File Systems (On-Disk Format, 4 KiB Block Size)” can be larger than the file system's actual size when using sparse blocks.
In this document: 1024 Bytes = 1 KiB; 1024 KiB = 1 MiB; 1024 MiB = 1 GiB; 1024 GiB = 1 TiB; 1024 TiB = 1 PiB; 1024 PiB = 1 EiB (see also NIST: Prefixes for Binary Multiples (http://physics.nist.gov/cuu/Units/binary.html).
|
File System (4 KiB Block Size) |
Maximum File System Size |
Maximum File Size |
|---|---|---|
|
Btrfs |
16 EiB |
16 EiB |
|
Ext3 |
16 TiB |
2 TiB |
|
Ext4 |
1 EiB |
16 TiB |
|
OCFS2 (a cluster-aware file system available in the High Availability Extension) |
16 TiB |
1 EiB |
|
ReiserFS v3.6 |
16 TiB |
1 EiB |
|
XFS |
8 EiB |
8 EiB |
|
NFSv2 (client side) |
8 EiB |
2 GiB |
|
NFSv3 (client side) |
8 EiB |
8 EiB |
Table 1.2, “Maximum Sizes of Files and File Systems (On-Disk Format, 4 KiB Block Size)” describes the limitations regarding the on-disk format. The Linux kernel imposes its own limits on the size of files and file systems handled by it. These are as follows:
On 32-bit systems, files cannot exceed 2 TiB (241 bytes).
File systems can be up to 273 bytes in size. However, this limit is still out of reach for the currently available hardware.
Table 1.3, “Storage Limitations” summarizes the kernel limits for storage associated with SUSE Linux Enterprise Server.
|
Storage Feature |
Limitation |
|---|---|
|
Maximum number of LUNs supported |
16384 LUNs per target. |
|
Maximum number of paths per single LUN |
No limit per se. Each path is treated as a normal LUN. The actual limit is given by the number of LUNs per target and the number of targets per HBA (16777215 for a Fibre Channel HBA). |
|
Maximum number of HBAs |
Unlimited. The actual limit is determined by the amount of PCI slots of the system. |
|
Maximum number of paths with device-mapper-multipath (in total) per operating system |
Approximately 1024. The actual number depends on the length of the device number strings. It is a compile-time variable within multipath-tools, which can be raised if this limit poses a problem. |
|
Maximum size per block device |
Up to 8 EiB. |
This section describes some known issues and possible solutions for file systems.
The root (/) partition using the Btrfs file system
stops accepting data. You receive the error “No space
left on device”.
See the following sections for information about possible causes and prevention of this issue.
If Snapper is running for the Btrfs file system, the “No
space left on device” problem is typically caused by
having too much data stored as snapshots on your system.
You can remove some snapshots from Snapper, however, the snapshots are not deleted immediately and might not free up as much space as you need.
To delete files from Snapper:
Open a terminal console.
At the command prompt, enter btrfs filesystem show,
for example:
tux > sudo btrfs filesystem show
Label: none uuid: 40123456-cb2c-4678-8b3d-d014d1c78c78
Total devices 1 FS bytes used 20.00GB
devid 1 size 20.00GB used 20.00GB path /dev/sda3Enter
sudo btrfs fi balance start mountpoint -dusage=5
This command attempts to relocate data in empty or near-empty data chunks, allowing the space to be reclaimed and reassigned to metadata. This can take a while (many hours for 1 TB) although the system is otherwise usable during this time.
List the snapshots in Snapper. Enter
sudo snapper -c root list
Delete one or more snapshots from Snapper. Enter
sudo snapper -c root delete snapshot_number(s)
Ensure that you delete the oldest snapshots first. The older a snapshot is, the more disk space it occupies.
To help prevent this problem, you can change the Snapper cleanup
algorithms. See Section “Cleanup-algorithms”, Chapter 4, System Recovery and Snapshot Management with Snapper, Administration Guide for
details. The configuration values controlling snapshot cleanup are
EMPTY_*, NUMBER_*, and
TIMELINE_*.
If you use Snapper with Btrfs on the file system disk, it is advisable to reserve twice the amount of disk space than the standard storage proposal. The YaST Partitioner automatically proposes twice the standard disk space in the Btrfs storage proposal for the root file system.
If the system disk is filling up with data, you can try deleting files
from /var/log, /var/crash,
and /var/cache.
The Btrfs root file system subvolumes
/var/log, /var/crash and
/var/cache can use all of the available disk space
during normal operation, and cause a system malfunction. To help avoid
this situation, SUSE Linux Enterprise Server offers Btrfs quota support for
subvolumes. See Section 1.2.1.5, “Btrfs Quota Support for Subvolumes” for
details.
Each of the file system projects described above maintains its own home page on which to find mailing list information, further documentation, and FAQs:
The Btrfs Wiki on Kernel.org: https://btrfs.wiki.kernel.org/
E2fsprogs: Ext2/3/4 File System Utilities: http://e2fsprogs.sourceforge.net/
Introducing Ext3: http://www.ibm.com/developerworks/linux/library/l-fs7/
XFS: A High-Performance Journaling Filesytem: http://oss.sgi.com/projects/xfs/
The OCFS2 Project: http://oss.oracle.com/projects/ocfs2/
A comprehensive multi-part tutorial about Linux file systems can be found at IBM developerWorks in the Advanced File System Implementor’s Guide (https://www.ibm.com/developerworks/linux/library/l-fs/).
An in-depth comparison of file systems (not only Linux file systems) is available from the Wikipedia project in Comparison of File Systems (http://en.wikipedia.org/wiki/Comparison_of_file_systems#Comparison).