libvirt
qemu-system-ARCH allows for various storage caching strategies to be
specified when configuring a KVM guest. Each guest disk interface can
have one of the following cache modes specified:
writethrough, writeback,
none, directsync, or
unsafe. If no cache mode is specified,
qemu-system-ARCH uses an appropriate default cache mode. These cache
modes influence how host-based storage is accessed, as follows:
Read/write data may be cached in the host page cache.
The guest's storage controller is informed whether a write cache is present, allowing for the use of a flush command.
Synchronous write mode may be used, in which write requests are reported complete only when committed to the storage device.
Flush commands (generated by the guest storage controller) may be ignored for performance reasons.
In the event of a disorderly disconnection between the guest and its storage, the cache mode in use will affect whether data loss occurs. The cache mode can also affect disk performance significantly. Additionally, some cache modes are incompatible with live migration, depending on a number of factors. There are no simple rules about what combination of cache mode, disk image format, image placement, or storage sub-system is best. The user should plan each guest's configuration carefully and experiment with various configurations to determine the optimal performance.
In qemu-system-ARCH versions older than v1.2 (e.g.
SLES11 SP2), not specifying a cache mode meant that
writethrough would be used as the default. Since
that version, the various qemu-system-ARCH guest storage interfaces
have been fixed to handle writeback or
writethrough semantics more correctly, allowing
for the default caching mode to be switched to
writeback. The guest driver for each of
ide, scsi, and
virtio have within their power to disable the write
back cache, causing the caching mode used to revert to
writethrough. The typical guest's storage drivers
will maintain the default caching mode as
writeback, however.
This mode causes qemu-system-ARCH to interact with the disk image file
or block device with O_DSYNC semantics, where writes are reported as
completed only when the data has been committed to the storage device.
The host page cache is used in what can be termed a writethrough
caching mode. The guest's virtual storage adapter is informed that
there is no writeback cache, so the guest would not need to send down
flush commands to manage data integrity. The storage behaves as if
there is a writethrough cache.
This mode causes qemu-system-ARCH to interact with the disk image file
or block device with neither O_DSYNC nor O_DIRECT semantics, so the
host page cache is used and writes are reported to the guest as
completed when placed in the host page cache, and the normal page
cache management will handle commitment to the storage device.
Additionally, the guest's virtual storage adapter is informed of the
writeback cache, so the guest would be expected to send down flush
commands as needed to manage data integrity. Analogous to a raid
controller with RAM cache.
This mode causes qemu-system-ARCH to interact with the disk image
file or block device with O_DIRECT semantics, so the host page cache is
bypassed and I/O happens directly between the qemu-system-ARCH userspace
buffers and the storage device. Because the actual storage device may
report a write as completed when placed in its write queue only, the
guest's virtual storage adapter is informed that there is a writeback
cache, so the guest would be expected to send down flush commands as
needed to manage data integrity. Performance-wise, it is equivalent to
direct access to your host's disk.
This mode is similar to the cache=writeback mode
discussed above. The key aspect of this “unsafe” mode, is
that all flush commands from the guests are ignored. Using this mode
implies that the user has accepted the trade-off of performance over
risk of data loss in the event of a host failure. Useful, for example,
during guest installation, but not for production workloads.
This mode causes qemu-system-ARCH to interact with the disk image file
or block device with both O_DSYNC and O_DIRECT semantics, where writes
are reported as completed only when the data has been committed to the
storage device, and when it is also desirable to bypass the host page
cache. Like cache = writethrough, it is helpful to
guests that do not send flushes when needed. It was the last cache
mode added, completing the possible combinations of caching and direct
access semantics.
These are the safest modes, and considered equally safe, given that
the guest operating system is “modern and well behaved”,
which means that it uses flushes as needed. If you have a suspect
guest, use writethough, or
directsync. Note that some file systems are not
compatible with cache=none or
cache=directsync, as they do not support O_DIRECT,
which these cache modes rely on.
This mode informs the guest of the presence of a write cache, and relies on the guest to send flush commands as needed to maintain data integrity within its disk image. This is a common storage design which is completely accounted for within modern file systems. But it should be noted that because there is a window of time between the time a write is reported as completed, and that write being committed to the storage device, this mode exposes the guest to data loss in the unlikely event of a host failure.
This mode is similar to writeback caching except the guest flush commands are ignored, nullifying the data integrity control of these flush commands, and resulting in a higher risk of data loss due to host failure. The name “unsafe” should serve as a warning that there is a much higher potential for data loss due to a host failure than with the other modes. Note that as the guest terminates, the cached data is flushed at that time.
The choice to make full use of the page cache, or to write through it, or
to bypass it altogether can have dramatic performance implications. Other
factors that influence disk performance include the capabilities of the
actual storage system, what disk image format is used, the potential size
of the page cache and the IO scheduler used. Additionally, not flushing
the write cache increases performance, but with risk, as noted above. As
a general rule, high-end systems typically perform best with
cache = none, because of the reduced data copying that
occurs. The potential benefit of having multiple guests share the common
host page cache, the ratio of reads to writes, and the use of
aio = native (see below) should also be considered.
The caching of storage data and meta-data restricts the configurations
that support live migration. Currently, only raw,
qcow2 and qed image formats can be
used for live migration. If a clustered file system is used, all cache
modes support live migration. Otherwise the only cache mode that
supports live migration on read/write shared storage is cache =
none.
The libvirt management layer includes checks for
migration compatibility based on a number of factors. If the guest
storage is hosted on a clustered file system, is read-only or is marked
sharable, then the cache mode is ignored when determining if migration
can be allowed. Otherwise libvirt will not allow
migration unless the cache mode is set to none.
However, this restriction can be overridden with the
“unsafe” option to the migration APIs, which is also
supported by virsh, as for example in
virsh migrate --live --unsafe
cache = none is required for the IO mode setting
aio = native. If another cache mode is used, then the
IO mode will silently be switched back to the default aio =
threads. qemu-system-ARCH implements the guest flush within
the host by using fdatasync().