.. SPDX-License-Identifier: GPL-2.0

================
Digest_cache LSM
================

Introduction
============

Integrity detection and protection has long been a desirable feature, to
reach a large user base and mitigate the risk of flaws in the software and
attacks.

However, while solutions exist, they struggle to reach the large user base,
due to requiring higher than desired constraints on performance,
flexibility and configurability, that only security conscious people are
willing to accept.

This is where the new digest_cache LSM comes into play, it offers
additional support for new and existing integrity solutions, to make them
faster and easier to deploy.


Motivation
==========

The digest_cache LSM helps to address two important shortcomings of the
Integrity Measurement Architecture (IMA): predictability of the Platform
Configuration Registers (PCRs), and the provisioning of reference values to
compare the calculated file digest against.

Remote attestation, according to Trusted Computing Group (TCG)
specifications, is done by replicating the PCR extend operation in
software with the digests in the event log (in this case the IMA
measurement list), and by comparing the obtained value with the PCR value
signed by the TPM with the quote operation.

Due to how the extend operation is performed, if measurements are done in
a different order, the final PCR value will be different. That means that
if measurements are done in parallel, there is no way to predict what the
final PCR value will be, making impossible to seal data to a PCR value. If
the PCR value was predictable, a system could for example prove its
integrity by unsealing and using its private key, without sending every
time the full list of measurements.

Provisioning reference values for file digests is also a difficult task.
The solution so far was to add file signatures to RPM packages, and
possibly to DEB packages, so that IMA can verify them. While this undoubtly
works, it also requires Linux distribution vendors to support the feature
by rebuilding all their packages, and eventually extending their PKI to
perform the additional signatures. It could also require developers extra
work to deal with the additional data.

On the other hand, since often packages carry the file digests themselves,
it won't be actually needed to add file signatures. If the kernel was able
to extract the file digests by itself, all the tasks mentioned above for
the Linux distribution vendors won't be needed too. All current and past
Linux distributions can be easily retrofitted to enable IMA appraisal with
the file digests from the packages.

Narrowing down the scope of a package parser to only extract specific
information makes it small enough to accurately verify that it cannot harm
the kernel. In fact, the parsers included with the digest_cache LSM have
been verified with the formal verification tool Frama-C, albeit with a
limited buffer size (the verification time grows considerably with bigger
buffer sizes). The parsers with the Frama-C assertions are available here:

https://github.com/robertosassu/rpm-formal/

Frama-C asserts that the parsers don't read beyond their assigned buffer
for any byte combination.

An additional mitigation against corrupted digest lists consists in
verifying the signature of the package first, before attempting to extract
the file digests.


Solution
========

The digest_cache LSM can help IMA to extend a PCR in a deterministic way.
If IMA knows that a file comes from a Linux distribution, it can measure
files in a different way: measure the list of digests coming from the
distribution (e.g. RPM package headers), and subsequently measure a file if
it is not found in that list.

If the system executes known files, it does not matter in which order they
are executed, because the PCR is not extended. That however means that the
lists of digests must be measured in a deterministic way. The digest_cache
LSM has a prefetching mechanism to make this happen, consisting in
sequentially reading digest lists in a directory until it finds the
requested one.

The resulting IMA measurement list however has a disadvantage: it does not
tell to remote verifiers whether files with digest in the measured digest
lists have been accessed or not and when. Also the IMA measurement list
would change after a software update.

The digest_cache LSM can also help IMA for appraisal. Currently, IMA has
to evaluate the signature of each file individually, and expects that the
Linux vendors include those signatures together with the files in the
packages.

With the digest_cache LSM, IMA can simply lookup in the list of digests
extracted from package headers, once the signature of those headers has
been verified. The same approach can be followed by other LSMs, such as
Integrity Policy Enforcement (IPE).


Design
======

Digest cache
------------

The digest_cache LSM collects digests from various sources (called digest
lists), and stores them in kernel memory, in a set of hash tables forming a
digest cache. Extracted digests can be used as reference values for
integrity verification of file content or metadata.

A digest cache has three types of references: in the inode security blob of
the digest list the digest cache was created from (dig_owner field); in the
security blob of the inodes for which the digest cache is requested
(dig_user field); a reference returned by digest_cache_get().

References are released with digest_cache_put(), in the first two cases
when inodes are evicted from memory, in the last case when that function is
explicitly called. Obtaining a digest cache reference means that the digest
cache remains valid and cannot be freed until releasing it and until the
total number of references (stored in the digest cache) becomes zero.

When digest_cache_get() is called on an inode to compare its digest with
a reference value, the digest_cache LSM knows which digest cache to get
from the new security.digest_list xattr added to that inode, which contains
the file name of the desired digest list digests will be extracted from.

All digest lists are expected to be in the same directory, defined in the
kernel config, and modifiable at run-time through securityfs. When the
digest_cache LSM reads the security.digest_list xattr, it uses its value as
last path component, appended to the default path (unless the default path
is a file). If an inode does not have that xattr, the default path is
considered as the final destination.

The default path can be either a file or a directory. If it is a file, the
digest_cache LSM always uses the same digest cache from that file to verify
all inodes (the xattr, if present, is ignored). If it is a directory, and
the inode to verify does not have the xattr, the digest_cache LSM iterates
and looks up on the digest caches created from each directory entry.

Digest caches are created on demand, only when digest_cache_get() is
called. The first time a digest cache is requested, the digest_cache LSM
creates it and sets its reference in the dig_owner and dig_user fields of
the respective inode security blobs. On the next requests, the previously
set reference is returned, after incrementing the reference count.

Since there might be multiple digest_cache_get() calls for the same inode,
or for different inodes pointing to the same digest list, dig_owner_mutex
and dig_user_mutex have been introduced to protect the check and assignment
of the digest cache reference in the inode security blob.

Contenders that didn't get the lock also have to wait until the digest
cache is fully instantiated (when the bit INIT_IN_PROGRESS is cleared).
Dig_owner_mutex cannot be used for waiting on the instantiation to avoid
lock inversion with the inode lock for directories.


Verification data
-----------------

The digest_cache LSM can support other LSMs in their decisions of granting
access to file content and metadata.

However, the information alone about whether a digest was found in a digest
cache might not be sufficient, because for example those LSMs wouldn't know
whether the digest cache itself was created from authentic data.

Digest_cache_verif_set() lets the same LSMs (or a chosen integrity
provider) evaluate the digest list being read during the creation of the
digest cache, by implementing the kernel_post_read_file LSM hook, and lets
them attach their verification data to that digest cache.

Space is reserved in the file descriptor security blob for the digest cache
pointer. Digest_cache_to_file_sec() sets that pointer before calling
kernel_read_file() in digest_cache_populate(), and
digest_cache_from_file_sec() retrieves the pointer back from the file
descriptor passed by LSMs with digest_cache_verif_set().

Multiple providers are supported, in the event there are multiple
integrity LSMs active. Each provider should also provide an unique verifier
ID as an argument to digest_cache_verif_set(), so that verification data
can be distinguished.

A caller of digest_cache_get() can retrieve back the verification data by
calling digest_cache_verif_get() and passing a digest cache pointer and the
desired verifier ID.

Since directory digest caches are not populated themselves, LSMs have to do
a lookup first to get the digest cache containing the digest, call
digest_cache_from_found_t() to convert the returned digest_cache_found_t
type to a digest cache pointer, and pass that to digest_cache_verif_get().


Directories
-----------

In the environments where xattrs are not available (e.g. in the initial ram
disk), the digest_cache LSM cannot precisely determine which digest list in
a directory contains the desired reference digest. However, although
slower, it would be desirable to search the digest in all digest lists of
that directory.

This done in two steps. When a digest cache is being created,
digest_cache_create() invokes digest_cache_dir_create(), to generate the
list of current directory entries. Entries are placed in the list in
ascending order by the <seq num> if prepended to the file name, or at the
end of the list if not.

The resulting digest cache has the IS_DIR bit set, to distinguish it from
the digest caches created from regular files.

Second, when a digest is searched in a directory digest cache,
digest_cache_lookup() invokes digest_cache_dir_lookup_digest() to
iteratively search that digest in each directory entry generated by
digest_cache_dir_create().

That list is stable, even if new files are added or deleted from that
directory. In that case, the digest_cache LSM will invalidate the digest
cache, forcing next callers of digest_cache_get() to get a new directory
digest cache with the updated list of directory entries.

If the current directory entry does not have a digest cache reference,
digest_cache_dir_lookup_digest() invokes digest_cache_create() to create a
new digest cache for that entry. In either case,
digest_cache_dir_lookup_digest() calls then digest_cache_htable_lookup()
with the new/existing digest cache to search the digest.

The iteration stops when the digest is found. In that case,
digest_cache_dir_lookup_digest() returns the digest cache reference of the
current directory entry as the digest_cache_found_t type, so that callers
of digest_cache_lookup() don't mistakenly try to call digest_cache_put()
with that reference.

This new reference type will be used to retrieve information about the
digest cache containing the digest, which is not known in advance until the
digest search is performed.

The order of the list of directory entries influences the speed of the
digest search. A search terminates faster if less digest caches have to be
created. One way to optimize it could be to order the list of digest lists
in the same way of when they are requested at boot.

Finally, digest_cache_dir_free() releases the digest cache references
stored in the list of directory entries, and frees the list itself.


Prefetching
-----------

A desirable goal when doing integrity measurements is that they are done
always in the same order across boots, so that the resulting PCR value
becomes predictable and suitable for sealing policies. However, due to
parallel execution of system services at boot, a deterministic order of
measurements is difficult to achieve.

The digest_cache LSM is not exempted from this issue. Under the assumption
that only the digest list is measured, and file measurements are omitted if
their digest is found in that digest list, a PCR can be predictable only if
all files belong to the same digest list. Otherwise, it will still be
unpredictable, since files accessed in a non-deterministic order will cause
digest lists to be measured in a non-deterministic order too.

The prefetching mechanism overcomes this issue by searching a digest list
file name in digest_list_dir_lookup_filename() among the entries of the
linked list built by digest_cache_dir_create(). If the file name does not
match, it reads the digest list to trigger its measurement. Otherwise, it
also creates a digest cache and returns that to the caller.

Prefetching needs to be explicitly enabled by setting the new
security.dig_prefetch xattr to 1 in the directory containing the digest
lists. The newly introduced function digest_cache_prefetch_requested()
checks first if the DIR_PREFETCH bit is set in dig_owner, otherwise it
reads the xattr. digest_cache_create() sets DIR_PREFETCH in dig_owner, if
prefetching is enabled, before declaring the digest cache as initialized.


Tracking changes
----------------

The digest_cache LSM registers to five LSM hooks, file_open, path_truncate,
file_release, inode_unlink and inode_rename, to monitor digest lists and
directory modifications.

If an action affects a digest list or the parent directory, these hooks
call digest_cache_reset() to set the RESET bit on the digest cache. This
will cause next calls to digest_cache_get() and digest_cache_create() to
respectively put and clear dig_user and dig_owner, and request a new
digest cache.

That does not affect other users of the old digest cache, since that one
remains valid as long as the reference count is greater than zero. However,
they can explicitly call the new function digest_cache_was_reset(), to
check if the RESET bit was set on the digest cache reference they hold.

Recreating a file digest cache means reading the digest list again and
extracting the digests. Recreating a directory digest cache, instead, does
not mean recreating the digest cache for directory entries, since those
digest caches are likely already stored in the inode security blob. It
would happen however for new files.

File digest cache reset is done on file_open, when a digest list is opened
for write, path_truncate, when a digest list is truncated (there is no
inode_truncate, file_truncate does not catch operations through the
truncate() system call), inode_unlink, when a digest list is removed, and
inode_rename when a digest list is renamed.

Directory digest cache reset is done on file_release, when a digest list is
written in the digest list directory, on inode_unlink, when a digest list
is deleted from that directory, and finally on inode_rename, when a digest
list is moved to/from that directory.

With the exception of file_release, which will always be executed (cannot
be denied), the other LSM hooks are not optimal, since the digest_cache LSM
does not know whether or not the operation will be allowed also by other
LSMs. If the operation is denied, the digest_cache LSM would do an
unnecessary reset.


Data structures and API
=======================

Data structures
---------------

These are the data structures defined and used internally by the
digest_cache LSM.

.. kernel-doc:: security/digest_cache/internal.h


Public API
----------

This API is meant to be used by users of the digest_cache LSM.

.. kernel-doc:: include/linux/digest_cache.h
		:identifiers: digest_cache_found_t
		              digest_cache_from_found_t

.. kernel-doc:: security/digest_cache/main.c
		:identifiers: digest_cache_get digest_cache_put

.. kernel-doc:: security/digest_cache/htable.c
		:identifiers: digest_cache_lookup

.. kernel-doc:: security/digest_cache/verif.c
		:identifiers: digest_cache_verif_set digest_cache_verif_get

.. kernel-doc:: security/digest_cache/reset.c
		:identifiers: digest_cache_was_reset


Parser API
----------

This API is meant to be used by digest list parsers.

.. kernel-doc:: security/digest_cache/htable.c
		:identifiers: digest_cache_htable_init
		              digest_cache_htable_add
			      digest_cache_htable_lookup


Digest List Formats
===================

tlv
---

The Type-Length-Value (TLV) format was chosen for its extensibility.
Additional fields can be added without breaking compatibility with old
versions of the parser.

The layout of a tlv digest list is the following::

 [header: DIGEST_LIST_FILE, num fields, total len]
 [field: DIGEST_LIST_ALGO, length, value]
 [field: DIGEST_LIST_ENTRY#1, length, value (below)]
  |- [header: DIGEST_LIST_ENTRY_DATA, num fields, total len]
  |- [DIGEST_LIST_ENTRY_DIGEST#1, length, file digest]
  |- [DIGEST_LIST_ENTRY_PATH#1, length, file path]
 [field: DIGEST_LIST_ENTRY#N, length, value (below)]
  |- [header: DIGEST_LIST_ENTRY_DATA, num fields, total len]
  |- [DIGEST_LIST_ENTRY_DIGEST#N, length, file digest]
  |- [DIGEST_LIST_ENTRY_PATH#N, length, file path]

DIGEST_LIST_ALGO is a field to specify the algorithm of the file digest.
DIGEST_LIST_ENTRY is a nested TLV structure with the following fields:
DIGEST_LIST_ENTRY_DIGEST contains the file digest; DIGEST_LIST_ENTRY_PATH
contains the file path.


rpm
---

The rpm digest list is basically a subset of the RPM package header.
Its format is::

 [RPM magic number]
 [RPMTAG_IMMUTABLE]

RPMTAG_IMMUTABLE is a section of the full RPM header containing the part
of the header that was signed, and whose signature is stored in the
RPMTAG_RSAHEADER section.


Appended Signature
------------------

Digest lists can have a module-style appended signature, that can be used
for appraisal with IMA. The signature type can be PKCS#7, as for kernel
modules, or a different type.


History
=======

The original name of this work was IMA Digest Lists, which was somehow
considered too invasive. The code was moved to a separate component named
DIGLIM (DIGest Lists Integrity Module), with the purpose of removing the
complexity away of IMA, and also adding the possibility of using it with
other kernel components (e.g. Integrity Policy Enforcement, or IPE).

The design changed significantly, so DIGLIM was renamed to digest_cache
LSM, as the name better reflects what the new component does.

Since it was originally proposed, in 2017, this work grew up a lot thanks
to various comments/suggestions. It became integrally part of the openEuler
distribution since end of 2020.

The most important difference between the old the current version is moving
from a centralized repository of file digests to a per-package repository.
This significantly reduces the memory pressure, since digest lists are
loaded into kernel memory only when they are actually needed. Also, file
digests are automatically unloaded from kernel memory at the same time
inodes are evicted from memory during reclamation.


Performance
===========

System specification
--------------------

The tests have been performed on a Fedora 38 virtual machine with 4 cores
(AMD EPYC-Rome, no hyperthreading), 4 GB of RAM, no TPM/TPM passthrough/
emulated. The QEMU process has been pinned to 4 real CPU cores and its
priority was set to -20.


Benchmark tool
--------------

The digest_cache LSM has been tested with an ad-hoc benchmark tool that
creates 20000 files with a random size up to 100 bytes and randomly adds
their digest to one of 303 digest lists. The number of digest lists has
been derived from the ratio (66) digests/packages (124174/1883) found in
the testing virtual machine (hence, 20000/66 = 303). IMA signatures have
been done with ECDSA NIST P-384.

The benchmark tool then creates a list of 20000 files to be accessed,
randomly chosen (there can be duplicates). This is necessary to make the
results reproducible across reboots (by always replaying the same
operations). The benchmark reads (sequentially and in parallel) the files
from the list 2 times, flushing the kernel caches before each read.

Each test has been performed 5 times, and the average value is taken.


Purpose of the benchmark
------------------------

The purpose of the benchmark is to show the performance difference of IMA
between the current behavior, and by using the digest_cache LSM.


IMA measurement policy: no cache
--------------------------------

.. code-block:: bash

 measure func=FILE_CHECK fowner=2001 pcr=12


IMA measurement policy: cache
-----------------------------

.. code-block:: bash

 measure func=DIGEST_LIST_CHECK pcr=12
 measure func=FILE_CHECK fowner=2001 digest_cache=content pcr=12


IMA Measurement Results
-----------------------

Sequential
~~~~~~~~~~

This test was performed reading files sequentially, and waiting for the
current read to terminate before beginning a new one.

::

                      +-------+------------------------+-----------+
                      | meas. | time no/p/vTPM (sec.)  | slab (KB) |
 +--------------------+-------+------------------------+-----------+
 | no cache           | 12313 | 33.65 / 102.51 / 47.13 |   84170   |
 +--------------------+-------+------------------------+-----------+
 | cache, no prefetch |   304 | 34.04 / 33.32 / 33.09  |   81159   |
 +--------------------+-------+------------------------+-----------+
 | cache, prefetch    |   304 | 34.02 / 33.31 / 33.15  |   81122   |
 +--------------------+-------+------------------------+-----------+

The table shows that 12313 measurements (boot_aggregate + files) have been
made without the digest cache, and 304 with the digest cache
(boot_aggregate + digest lists). Consequently, the memory occupation
without the cache is higher due to the higher number of measurements.

Not surprisingly, for the same reason, also the test time is significantly
higher without the digest cache when the physical or virtual TPM is used.

In terms of pure performance, first number in the third column, it can be
seen that there are not really performance differences between using or not
using the digest cache.

Prefetching does not add overhead, also because digest lists were ordered
according to their appearance in the IMA measurement list (which minimize
the digest lists to prefetch).


Parallel
~~~~~~~~

This test was performed reading files in parallel, not waiting for the
current read to terminate.

::

                      +-------+-----------------------+-----------+
                      | meas. | time no/p/vTPM (sec.) | slab (KB) |
 +--------------------+-------+-----------------------+-----------+
 | no cache           | 12313 | 14.08 / 79.09 / 22.70 |   85138   |
 +--------------------+-------+-----------------------+-----------+
 | cache, no prefetch |   304 | 14.44 / 15.11 / 14.96 |   85777   |
 +--------------------+-------+-----------------------+-----------+
 | cache, prefetch    |   304 | 14.30 / 15.41 / 14.40 |   83294   |
 +--------------------+-------+-----------------------+-----------+

Also in this case, the physical TPM causes the biggest delay especially
without digest cache, where a higher number of measurements need to be
extended in the TPM.

The digest_cache LSM does not introduce a noticeable overhead in all
scenarios.


IMA appraisal policy: no cache
------------------------------

.. code-block:: bash

 appraise func=FILE_CHECK fowner=2001


IMA appraisal policy: cache
---------------------------

.. code-block:: bash

 appraise func=DIGEST_LIST_CHECK
 appraise func=FILE_CHECK fowner=2001 digest_cache=content


IMA Appraisal Results
---------------------

Sequential
~~~~~~~~~~

This test was performed reading files sequentially, and waiting for the
current read to terminate before beginning a new one.

::

                              +-------------+-------------+-----------+
                              |    files    | time (sec.) | slab (KB) |
 +----------------------------+-------------+-------------+-----------+
 | appraise (ECDSA sig)       |    12312    |    96.74    |   78827   |
 +----------------------------+-------------+-------------+-----------+
 | appraise (cache)           | 12312 + 303 |    33.09    |   80854   |
 +----------------------------+-------------+-------------+-----------+
 | appraise (cache, prefetch) | 12312 + 303 |    33.42    |   81050   |
 +----------------------------+-------------+-------------+-----------+

This test shows a huge performance difference from verifying the signature
of 12312 files as opposed to just verifying the signature of 303 digest
lists, and looking up the digest of the files being read.

There are some differences in terms of memory occupation, which is quite
expected due to the fact that we have to take into account the digest
caches loaded in memory, while with the standard appraisal they don't
exist.


Parallel
~~~~~~~~

This test was performed reading files in parallel, not waiting for the
current read to terminate.

::

                              +-------------+-------------+-----------+
                              |    files    | time (sec.) | slab (KB) |
 +----------------------------+-------------+-------------+-----------+
 | appraise (ECDSA sig)       |    12312    |    27.68    |   80596   |
 +----------------------------+-------------+-------------+-----------+
 | appraise (cache)           | 12313 + 303 |    14.96    |   80778   |
 +----------------------------+-------------+-------------+-----------+
 | appraise (cache, prefetch) | 12313 + 303 |    14.78    |   83354   |
 +----------------------------+-------------+-------------+-----------+

The difference is less marked when performing the read in parallel. Also,
more memory seems to be occupied in the prefetch case.


How to Test
===========

Additional patches need to be applied to the kernel.

The patch to introduce the file_release LSM hook:

https://lore.kernel.org/linux-integrity/20240115181809.885385-14-roberto.sassu@huaweicloud.com/

The patch set to use the PGP keys from the Linux distributions for
verifying the RPM header signatures:

https://lore.kernel.org/linux-integrity/20230720153247.3755856-1-roberto.sassu@huaweicloud.com/

The same URL contains two GNUPG patches to be applied to the user space
program.

The patch set to use the digest_cache LSM from IMA:

https://github.com/robertosassu/linux/commits/digest_cache-lsm-v3-ima/

First, it is necessary to install the kernel headers in usr/ in the kernel
source directory:

.. code-block:: bash

 $ make headers_install

After, it is necessary to copy the new kernel headers (tlv_parser.h,
uasym_parser.h, tlv_digest_list.h) from usr/include/linux in the kernel
source directory to /usr/include/linux.

Then, gpg must be rebuilt with the additional patches to convert the PGP
keys of the Linux distribution to the new user asymmetric key format:

.. code-block:: bash

 $ gpg --conv-kernel <path of PGP key> >> certs/uasym_keys.bin

This embeds the converted keys in the kernel image.

Finally, the following kernel options must be enabled:

.. code-block:: bash

 CONFIG_SECURITY_DIGEST_CACHE=y
 CONFIG_UASYM_KEYS_SIGS=y
 CONFIG_UASYM_PRELOAD_PUBLIC_KEYS=y

and the kernel must be rebuilt with the patches applied. After reboot, it
is necessary to build and install the digest list tools downloadable from:

https://github.com/linux-integrity/digest-cache-tools

and to execute (as root):

.. code-block:: bash

 # manage_digest_lists -o gen -d /etc/digest_lists -i rpmdb -f rpm

The new gpg must also be installed in the system, as it will be used to
convert the PGP signatures of the RPM headers to the user asymmetric key
format.

It is recommended to create an additional digest list with the following
files, by creating a file named ``list`` with the content:

.. code-block:: bash

 /usr/bin/manage_digest_lists
 /usr/lib64/libgen-tlv-list.so
 /usr/lib64/libgen-rpm-list.so
 /usr/lib64/libparse-rpm-list.so
 /usr/lib64/libparse-tlv-list.so

Then, to create the digest list, it is sufficient to execute:

.. code-block:: bash

 # manage_digest_lists -i list -L -d /etc/digest_lists -o gen -f tlv

Also, a digest list must be created for the modified gpg binary:

.. code-block:: bash

 # manage_digest_lists -i /usr/bin/gpg -d /etc/digest_lists -o gen -f tlv

If appraisal is enabled and in enforcing mode, it is necessary to sign the
new digest lists, with the sign-file tool in the scripts/ directory of the
kernel sources:

.. code-block:: bash

 # scripts/sign-file sha256 certs/signing_key.pem certs/signing_key.pem /etc/digest_lists/tlv-list
 # scripts/sign-file sha256 certs/signing_key.pem certs/signing_key.pem /etc/digest_lists/tlv-gpg

The final step is to add security.digest_list to each file with:

.. code-block:: bash

 # manage_digest_lists -i /etc/digest_lists -o add-xattr

After that, it is possible to test the digest_cache LSM with the following
policy written to /etc/ima/ima-policy:

.. code-block:: bash

 measure func=DIGEST_LIST_CHECK template=ima-modsig pcr=12
 dont_measure fsmagic=0x01021994
 measure func=BPRM_CHECK digest_cache=content pcr=12
 measure func=MMAP_CHECK digest_cache=content pcr=12

Tmpfs is excluded for now, until memfd is properly handled. The reason why
the DIGEST_LIST_CHECK rule is before the dont_measure is that otherwise
digest lists in the initial ram disk won't be processed.

Before loading the policy, it is possible to enable dynamic debug to see
which operations are done by the digest_cache LSM:

.. code-block:: bash

 # echo "file security/digest_cache/* +p" > /sys/kernel/debug/dynamic_debug/control

Alternatively, the same strings can be set as value of the dyndbg= option
in the kernel command line.

A preliminary test, before booting the system with the new policy, is to
supply the policy to IMA in the current system with:

.. code-block:: bash

 # cat /etc/ima/ima-policy > /sys/kernel/security/ima/policy

After executing some commands, it can be seen if the digest_cache LSM is
working by checking the IMA measurement list. If there are only digest
lists, it means that everything is working properly, and the system can be
rebooted. The instructions have been tested on a Fedora 38 OS.

After boot, it is possible to check the content of the measurement list:

.. code-block:: bash

 # cat /sys/kernel/security/ima/ascii_runtime_measurements


At this point, it is possible to enable the prefetching mechanism to make
the PCR predictable. The virtual machine must be configured with a TPM
(Emulated).

To enable the prefetching mechanism, it is necessary to set
security.dig_prefetch to '1' for the /etc/digest_lists directory:

.. code-block:: bash

 # setfattr -n security.dig_prefetch -v "1" /etc/digest_lists

The final step is to reorder digest lists to be in the same order in which
they appear in the IMA measurement list.

This can be done by executing the command:

.. code-block:: bash

 # manage_digest_lists -i /sys/kernel/security/ima/ascii_runtime_measurements -d /etc/digest_lists -o add-seqnum

Since we renamed the digest lists, we need to update security.digest_list
too:

.. code-block:: bash

 # manage_digest_lists -i /etc/digest_lists -o add-xattr

By rebooting several times, and just logging in (to execute the same
commands during each boot), it is possible to compare the PCR 12, and see
that it is always the same. That of course works only if the TPM is reset
at each boot (e.g. if the virtual machine has a virtual TPM) or if the code
is tested in the host environment.

.. code-block:: bash

 # cat /sys/devices/LNXSYSTM:00/LNXSYBUS:00/MSFT0101:00/tpm/tpm0/pcr-sha256/12

The last step is to test IMA appraisal. This can be done by adding the
following lines to /etc/ima/ima-policy:

.. code-block:: bash

 appraise func=DIGEST_LIST_CHECK appraise_type=imasig|modsig
 dont_appraise fsmagic=0x01021994
 appraise func=BPRM_CHECK digest_cache=content
 appraise func=MMAP_CHECK digest_cache=content

The following test is to ensure that IMA prevents the execution of unknown
files:

.. code-block:: bash

 # cp -a /bin/cat .
 # ./cat

That will work. But not on the modified binary:

.. code-block:: bash

 # echo 1 >> cat
 # ./cat
 -bash: ./cat: Permission denied

Execution will be denied, and a new entry in the measurement list will
appear (it would be probably ok to not add that entry, as access to the
file was denied):

.. code-block:: bash

 12 50b5a68bea0776a84eef6725f17ce474756e51c0 ima-ng sha256:15e1efee080fe54f5d7404af7e913de01671e745ce55215d89f3d6521d3884f0 /root/cat

Finally, it is possible to test the shrinking of the digest cache, by
forcing the kernel to evict inodes from memory:

.. code-block:: bash

 # echo 3 > /proc/sys/vm/drop_caches

If dynamic debug was enabled, the kernel log should have messages like:

.. code-block:: bash

 [  313.032536] DIGEST CACHE: Removed digest sha256:102900208eef27b766380135906d431dba87edaa7ec6aa72e6ebd3dd67f3a97b from digest list /etc/digest_lists/rpm-libseccomp-2.5.3-4.fc38.x86_64

Optionally, it is possible to test IMA measurement/appraisal from the very
beginning of the boot process, for now by including all digest lists and the
IMA policy in the initial ram disk. In the future, there will be a dracut
patch for ``dracut_install`` to select only the necessary digest lists.

This can be simply done by executing:

.. code-block:: bash

 # dracut -f -I " /etc/ima/ima-policy " -i /etc/digest_lists/ /etc/digest_lists/ --nostrip --kver <your kernel version>

The --nostrip option is particularly important. If debugging symbols are
stripped from the binary, its digest no longer matches with the one from
the package, causing access denied.

The final test is to try the default IMA measurement and appraisal
policies, so that there is no gap between when the system starts and when
the integrity evaluation is effective. The default policies actually will
be used only until systemd is able to load the custom policy to
measure/appraise binaries and shared libraries. It should be good enough
for the system to boot.

The default IMA measurement and appraisal policies can be loaded at boot by
adding the following to the kernel command line:

.. code-block:: bash

 ima_policy="tcb|appraise_tcb|digest_cache_measure|digest_cache_appraise"

The ima-modsig template can be selected by adding to the kernel command
line:

.. code-block:: bash

 ima_template=ima-modsig
