LinuX, SSDs and disk encryption

Last week I updated the OS on my SSD-only laptop to Debian Jessie and I thought that would be a good opportunity to finally set up full-disk encryption(FDE) on that machine as well. Against my usual policy I did not do so initially fearing the possible performance drawbacks from not being able to use a technology called TRIM. Since cryptsetup 1.4 and a 3.1 kernel it is now possible to use TRIM in such a case as well, giving the option for a fast SSD-setup with full-disk encryption using cryptsetup/LUKS.

Although there are a lot of articles and posts on SSDs out there, I could not really find a good place summarising all the tweaks I used, so I decided to add one of my own. Note, that this guide is mainly intended for single-user machines, which have plenty of RAM — e.g. my system has 4GB. Some of the things mentioned might not be so suitable in other cases, but feel free to try.

The first two paragraphs are about the internal structure of an SSD, the TRIM command and how it can partially screw up full disk encryption. The remaining part of the article is intended to give hints and ideas how to set up your own system. Finally I provide some references for further reading.

Solid-state drives and the TRIM command

Unlike usual hard drives Solid-state drives or SSDs contain no "disk" of any kind nor any mechanical system driving them. They are just made out of flash memory blocks. This makes them a lot quieter, a lot faster on random read/write access and a lot more energy efficient. This all has a drawback, however: Each sector has only a specific write endurance, meaning that having written to it a given number of times, a sector can't be used any more. Consequently the available drive space of an SSD shrinks as it gets older. Nowadays the write count we need for this to happen is quite large. With the correct configuration and some countermeasures an SSD can easily last for tens of years.

An obvious aim is to try to get all sectors to wear out equally fast, the so-called write levelling. To achieve this the SSD needs to know which sectors are empty on a hardware level. This is where the TRIM command comes in. It is a way for the filesystem to tell the SSD which sectors contain no files. So when a file is deleted on the filesystem level a TRIM is issued and the SSD knows that the respective sectors can be added to the pool of empty sectors — for use in a later write operation. TRIM has further effects on some SSD operations like garbage collection. Without proper garbage collection write operations tend to become much longer(Write Amplification) and hence performance decreases. In summary TRIM both increases the SSD's lifetime and performance.

TRIM and disk encryption

Dm-crypt — the cryptographic system underlying cryptsetup — adds an additional layer to our picture. Let /dev/sda1 be the partition on the SSD, which contains the encrypted data. When we open this partition using dm-crypt a virtual disk is created that contains the decrypted version of the data of /dev/sda1. Let us refer to this virtual disk as /dev/dm-0. Usually dm-crypt takes all requests of the filesystem received on /dev/dm-0 and forwards them to /dev/sda1, just de/encrypting data transferred between them. So if the filesystem issues a TRIM to /dev/dm-0 to indicate that some sectors have been emptied (i.e. a file deleted), we would expect dm-crypt to forward this request to /dev/sda1. Cryptsetup versions after 1.4 are capable of doing this, but it is disabled by default and there is a good reason for that.

One of the features of disk-encryption is that the data written to the disk looks totally random. So in absence of the appropriate key all internal structure is hidden. Following the standard procedure we would first fill a partition with random data and only set up the encryption afterwards. The result is that an attacker can't even distinguish a sector that is filled with encrypted data from a sector that is unused, i.e. still filled with the random data from the set-up. This status is called plausible deniability: We could always deny the existence of any encrypted data on the partition and argue that the whole partition is just random noise and no one without the key can disprove this.

Now what happens if we force dm-crypt to pass the TRIM command onto the SSD? All sectors which are marked as empty on /dev/dm-0 via the filesystem issuing a TRIM will also be marked as empty on /dev/sda1 via dm-crypt forwarding the TRIM. This means that they become well distinguishable from filled sectors (with the encrypted random data) and plausible deniability is lost. In extreme cases extreme cases one might even be able to guess the file system used on the plaintext device /dev/dm-0!

In summary TRIM for encrypted devices greatly enhances performance but we loose plausible deniability. I personally consider this to be ok for my root filesystem, but I chose to keep it disabled for /home. If performance gets worse I might change my mind later. In any case just be aware what you are doing.

During installation — Partitioning and full-disk encryption

For efficient write levelling as described above we need a sufficiently large pool of empty sectors. Most SSDs therefore already have a so-called sparse area of sectors, which are invisible to the operating system and are hence always empty. Furthermore we can easily leave a part of the SSD unpartitioned to artificially increase the sparse area. Most articles I read recommend about 10% of unpartitioned space. Especially when not enabling TRIM for all encrypted partitions you might want to increase this value a little.

Now the filesystem choice. One might first think that a good idea would be a filesystem without a journal in order to avoid the overhead of metadata writes when files are modified or deleted. As Theodore Ts’o points out in his blog [5], this is not quite the case. After his analysis he recommends using ext4 with journal and using some mount options I'll go into in the section about the /etc/fstab. His analysis further shows that especially write-intensive operations (in his case the make clean) still place a much higher write load if journaling is used. We will therefore try to avoid all unnecessary writes by mounting various folders as ramdisks.

If you have enough RAM available (as I said I have 4GB), I would consider omitting a SWAP partition. If you still want SWAP you should disable hibernation for all users and also set the swappiness to zero after the installation. This will tell the kernel that the SWAP should only be touched if there is no physical memory left. For the remaining article I assume you have chosen to not create a SWAP partition.

Keeping the above in mind, use the installer of your distribution to do your favourite full disk encryption setup! It should automatically overwrite the disk with random data before creating the encrypted partitions (at least the Debian installer does this). I don't like the extra overhead of a Logical Volume Manager, so I don't use it. Instead I just create 3 separate partitions, one for the root filesystem(about 20GiB), one for /boot(about 100MiB) and a /home partition. On my 120GB SSD this yields:

Size Mount point File system Purpose
20GiB / ext4 inside LUKS Root filesystem (encrypted)
75GiB /home ext4 inside LUKS Home filesystem (encrypted)
100MiB /boot ext4 Boot partition
ca. 17GiB unpartitioned

With this setup you will need to provide two separate passwords for the encrypted /home and root partition (Which can surely be the same). In the initial setup you will need to type both of these to start up your system, but we will change this in the next section. The only password you will need after this is the one for the root partition. Keep that in mind when choosing your passwords.

After installation — Configuring dm-crypt for TRIM and use LUKS key derivation

For both of the these things we need to edit /etc/crypttab (as root). Your installer should have created entries for the encrypted partitions automatically and the file should look similar to

#/dev/mapper name   Device UUID            req_devs    cryptdevice options
sda1_crypt          UUID=000000000-...00   none        luks
sda2_crypt          UUID=000000000-...00   none        luks

To instruct dm-crypt to pass on the TRIM on, you simply need to add an discard to the block of cryptdevice options:

#/dev/mapper name   Device UUID            req_devs    cryptdevice options
sda1_crypt          UUID=000000000-...00   none        luks,discard
sda2_crypt          UUID=000000000-...00   none        luks,discard

As promised we now use key derivation to reduce the number of passwords we need to provide at start-up to only one. This is done using the pre-installed script /lib/cryptsetup/scripts/decrypt_derived. It takes an unlocked LUKS partition and uses its (static) LUKS header to derive a password from it. If this password is added as a key to other LUKS partitions we only need to unlock the first partition and we can then use the script to unlock the others automatically. Since the root partition needs to be unlocked first in either case, it makes sense to derive the other keys from this partition. In my case /dev/mapper/sda1_crypt is the mapper for the root partition and I run the commands

# create temporary ramdisk
mkdir /mnt/ram && mount -t ramfs -o size=1m ramfs /mnt/ram
chmod 600 /mnt/ram

# derive a key from root partition and save
# to temporary storage
/lib/cryptsetup/scripts/decrypt_derived sda1_crypt > /mnt/ram/tmp.key 

# Add key to other encrypted partitions
# Needs a valid Key for this partition
cryptsetup luksAddKey /dev/sda2 /mnt/ram/tmp.key
# Add to more if required ...

# Remove the key and clean up
rm /mnt/ram/tmp.key
umount /mnt/ram
rmdir /mnt/ram

to add the key derived off the unlocked root partition to the other encrypted LUKS partitions. Note that it is a good idea to keep the old, non-derived keys of the devices as well in case the LUKS header of the root partition gets damaged.

Finally we need to tell the system how to unlock itself at start-up, which is again done in the /etc/crypttab file:

#/dev/mapper name   Device UUID            req_devs    cryptdevice opts
sda1_crypt          UUID=000000000-...00   none        luks,discard
sda2_crypt          UUID=000000000-...00   sda1_crypt  luks,discard,keyscript=/lib/cryptsetup/scripts/decrypt_derived

By putting the entry sda1_crypt in the req_devs column we indicate that the sda1_crypt partition should be used to derive the key for sda2_crypt using the keyscript.

Now update the initial ramdisk: Run as root

update-initramfs -u -k all

and then reboot the machine to check if unlocking the devices works as intended. Afterwards you can also verify if dm-crypt indeed allows TRIM on the devices by running

dmsetup table /dev/mapper/sda1_crypt
dmsetup table /dev/mapper/sda2_crypt
#...

on the encrypted devices. The output should contain a allow_discards flag for each of them.

After installation — Editing /etc/fstab

For each partition that is located on an SSD drive you should at least add the mount options noatime and nodiratime (see [5]). These options will suppress the usual bookkeeping of access times for files and directories, respectively. Usually accessing a file implies a write as well since the current date and time will be written to the so-called inode table of the filesystem. For most applications this is not required and can therefore be safely disabled using the above options.

On top of that we can further provide the discard option selectively to those partitions where we want TRIM to be used — keeping in mind the consequences this has for plausible deniability (see above). In my case the /etc/fstab than looks similar to

#file system             point     type    options                     dump  pass
#My root partition:
/dev/mapper/sda1_crypt   /         ext4    noatime,nodiratime,discard,errors=remount-ro   0       1
#
#Boot partition:
UUID=00000000-...00      /boot     ext4    noatime,nodiratime,discard  0       2
#
#Home partition (no TRIM)
#/dev/mapper/sda2_crypt  /home     ext4    noatime,nodiratime          0       2

After installation — Setting up ramdisks using tmpfs

Next to the mount options a good paradigm is to use ramdisks for temporary files to avoid wasting writes for files we don't want to keep anyway. Looking at the UNIX Filesystem Hierarchy Standard we note that temporary files are stored at /tmp and (in practice to a lesser extent) at /var/tmp. Moving the files in these locations to RAM can now be easily achieved by mounting the corresponding paths with the filesystem type tmpfs. This will create a dynamical ramdisk, meaning that only as much RAM is used up by the disk as it requires to store the files. A maximum size can also specified via the mount option size=XXX.

If you can spare a little more RAM you should furthermore consider the use of ramdisks for locations in the file system tree where frequent writes take place, e.g. /var/log and /var/spool. You might be aware that /var/log contains the system's logfiles, meaning that by applying these settings you will loose all logs on system reboot. For servers this would surely be a problem, but in the case of laptops I don't want to loose this option of saving a lot of writes. And honestly: How often do you really need to check logs on a laptop system that are days old?

/var/spool is the location where files are saved that wait for a task to finish, e.g. printer queues or queues for cron or at. So for example printer queues or at jobs won't be kept across reboots any more if /var/spool is a ramdisk — again this is something I don't really need on a laptop.

A further problem with having /var/log and /var/spool in RAM is that the respective folder structures will also be discarded. When the system starts up many services and init-scripts will, however, assume the presence of certain folders with certain permissions in /var/log. The simple solution is to backup the folder structure on shutdown and to restore it when the system starts up. In principle this is what the attached script varFolders does. On top of that it also offers an option to keep all log files (e.g. for debugging purposes) and a check to avoid updating the backup if nothing has changed.

For the setup of the discussed ramdisks, I first added the following lines to the /etc/fstab:

#file systems   point       type    options                dump    pass
none            /tmp        tmpfs   noatime,size=1024m     0       0
none            /var/tmp    tmpfs   noatime,size=256m      0       0
none            /var/spool  tmpfs   noatime,size=512m      0       0
none            /var/log    tmpfs   noatime,size=512m      0       0

Remember that tmpfs is a dynamic ramdisk, so this does not mean you are actually filling roughly 2.2GiB of RAM — especially when emptying the logs on reboot you need hardly more than a few tens of MB for /var/tmp, /var/spool and /var/log together.

Then I executed the following as root to make use of varFolders:

# download and install varFolders:
wget -O /etc/init.d/varFolders https://gist.githubusercontent.com/mfherbst/c38d5acb0b6c48e90054c53d805f73e7/raw/35218ca60bd1012bc1128b8cec8d7aee71ba359c/varFolders
update-rc.d varFolders defaults

# create first image:
/etc/init.d/varFolders stop

# Move directories and create empty ones
for DIR in /var/tmp /var/log /var/spool; do
    mv $DIR $DIR.1 && \
    mkdir $DIR && \
    chmod --reference=$DIR.1 $DIR && \
    chown --reference=$DIR.1 $DIR && \
    mount --bind $DIR.1 $DIR && \
    echo successfully prepared $DIR
done

# If all directories are successfully prepared
# ie we get the three echos "successfully prepared $DIR"
# then proceed -- if not find the error and correct it!

reboot

After the reboot check if all services come up properly and if /var/log is populated with folders as expected. If that is the case the temporary folders /var/tmp.1, /var/log.1 and /var/spool.1 can be removed.

After installation — Changing the disk scheduler

The disk scheduler controls how the kernel schedules access to storage drives. For rotating disks data access is obviously a slow process and it needs to be distributed fairly between the many processes that might want to read or write something to/from different parts of the disk. The default scheduler called cfq is optimised for the latter scenario, but not for SSDs. Performance can be greatly improved if noop or deadline are used instead. Noop is the most simple one: It basically handles requests in the order they are submitted. Deadline in some sense implements a timer on top and stops requests that take too long. Which one to use pretty much depends on the underlying hardware and the use case, but for a normal laptop and a normal use it should not make much of a difference.

Changing the scheduler can be done in two ways. First, one can tell the kernel to use an alternative scheduler for all disks via a boot option. This is done by editing /etc/default/grub and adding elevator=deadline to the variable GRUB_CMDLINE_LINUX_DEFAULT, e.g.

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash elevator=deadline"

Afterwards run update-grub2 as root. Whist this is nice and simple, it implies that all drives are scheduled in that way and hence it is only really applicable for an SSD-only system.

The second option dynamically assigns the disk scheduler depending on the type of disk. This can be achieved using udev rules. For example create the file /etc/udev/rules.d/60-schedulers.rules

# set deadline scheduler for non-rotating disks
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="deadline"

# set cfq scheduler for rotating disks
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="cfq"

to use cfq and deadline.

After installation — Enabling device-level write cache

Turning on the write cache of a drive causes the drive to use some internal RAM to cache files until they are written to the disk properly. This improves performance in both SSDs as well as traditional hard drives. A disadvantage is that in the case of an abrupt power loss enabling this feature could lead to data loss. Since data is usually only kept in the disk cache for extremely short times this is hardly a problem for normal use cases.

First check if write-caching is not enabled already by running (as root)

hdparm -W /dev/sda

If this command fails your drive either has no write cache feature or hdparm can' access it. In either case you can't change the setting and should proceed to the next section. If it returns something along

/dev/sda:
 write-caching =  1 (on)

write caching is already enabled. Otherwise (i.e. for write-caching = 0) you can enable it by issuing

hdparm -W1 /dev/sda

Note that this setting is not preserved across reboots, so in order to make this change permanent, you have to add the line issued above to the file /etc/rc.local before the final exit 0.

After installation — User-level tweaks

On a user level it is mostly the browser cache which is very predictable to cause a lot of writes. Now that /tmp is located in RAM it is a good idea to make the browser drop its cache at this location. For firefox you do the following:

  1. Navigate to about:config and confim that you know what you are doing
  2. Left-click > New > String
  3. Call the new entry "browser.cache.disk.parent_directory"
  4. Give it the value "/tmp"
  5. Restart the browser to have it write the cache to /tmp

Something similar should be possible in other browsers like chromium as well.

Last but not least: Use /tmp frequently, e.g.:

  • Unpack tarballs to /tmp if you do not intent to keep the files for long.
  • Place temporary downloads in /tmp (if they are small enough).
  • Compile in /tmp if you are only interested in the final executable.

References

Attachments

#!/bin/bash
### BEGIN INIT INFO
# Provides: varFolders
# Required-Start:    $local_fs
# Required-Stop:     $local_fs
# Default-Start:     1 2 3 4 5
# Default-Stop:      0 6
# Short-Description: Backup and restore var folder structures
### END INIT INFO
#
# Script Version 1.0 (14/09/2013)
#
##################################################################################
# Licence:
#
# (C) 2013 Michael F. Herbst <info@michael-herbst.com>
#
# This script is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# It is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# A copy of the GNU General Public License can be found
# at <http://www.gnu.org/licenses/>.
#
##################################################################################
# Usage:
#
# run with "stop" to create a tar archive representing the  structure of a sub-
# directory of var. This means that no actual files, only symbolic links and
# folders are kept. Run with "start" to restore this structure back to the original
# location.
#
# The idea is that this allows write-intensive subfolders of var (eg. log, spool .. )
# to be mounted as ramdisks using tmpfs in a SSD-only system to reduce the amounts
# of writes the SSD suffers.
#
# DIRS are the subfolers of var which are subject to this treatment. With the 
# KEEPFILES the user can control if only the structure or everything should be 
# stored. (Eg. if something is debugged and files should be kept across power cyles)
#
# The script also checks for presence of /etc/var_folders_keep_files. If this file
# is present the script will automatically set KEEPFILES="y".
#
# If KEEPFILES="n" the archive will only be updated if the names of the subfolders /
# links have changed or if subfolders / links have been created / deleted. Other
# changes (eg permission changes, owner changes, ...) won't trigger an update
# automatically. You can however force an update at the next invocation of "stop"
# by calling varFolders with the option "force-update".
#
##################################################################################
# Settings:

KEEPFILES="n"           #Keep the files or only the folder structure?
DIRS="log spool"        #dirs under /var to extract folder structure/files from
ARFILE="var_folders.tar"    #Archive to keep stuff in (stored under /var/local)

##################################################################################

. /lib/lsb/init-functions
set -e
PATH=/sbin:/bin:/usr/sbin:/usr/bin
PROG=varFolders

initialise(){
    #set DIRS, DIR1 and KEEPFILES

    local DTMP=
    for DIR in $DIRS; do
        if [ "$DIR" == "local" ]; then
            echo "Can't store the directory structure of /var/local!"
            exit 1
        fi
        [ -z "$DIR1" ] && DIR1="$DIR"   #The first valid dir
        [ -d "/var/$DIR" ] && DTMP="$DTMP $DIR"
    done
    DIRS="$DTMP"
    if [ -z "$DIRS" ]; then
        echo "No valid directory in DIRS"
        exit 1
    fi

    #We go to /var
    cd /var
    mkdir -p /var/local

    [ -e /etc/var_folders_keep_files ] && KEEPFILES="y"
    true
}

update_archive() {
    if [ "$KEEPFILES" == "y" ]; then
        update_archive_keepfiles
        return $?
    fi
    update_archive_normal
    return $?
}

update_archive_keepfiles() {
    #Create tar with all files
    tar cf /var/local/"$ARFILE" $DIRS
    chmod 640 /var/local/"$ARFILE"
    find $DIRS -type d -o -type l > /var/local/"$ARFILE".list
}

update_archive_normal() {
    TMP=`mktemp`
    find $DIRS -type d -o -type l > "$TMP"

    SUMNEW=`cat "$TMP" | sha512sum`
    SUMOLD=
    [ -f /var/local/"$ARFILE".list ] && SUMOLD=`cat /var/local/"$ARFILE".list | sha512sum`

    if [[ "$SUMOLD" == "$SUMNEW" ]]; then
        rm "$TMP"
        return
    fi

    mv "$TMP" /var/local/"$ARFILE".list
    tar cf /var/local/"$ARFILE"  --no-recursion `cat /var/local/"$ARFILE".list`
    chmod 640 /var/local/"$ARFILE"
}

force_update() {
    #removing the list file will force the update if stop is called
    rm -f "/var/local/${ARFILE}.list"
}

extract_archive() {
    [ ! -f /var/local/"$ARFILE" ] && return 1

    #Archive has already been extracted:
    [ -f /var/${DIR1}/archive_extracted ] && return 0

    #Extract:
    tar xf /var/local/"$ARFILE"
    touch /var/${DIR1}/archive_extracted
}

##################################################################################

case $1 in
    start)
        log_begin_msg "Starting $PROG"
        if ! initialise; then
            log_end_msg 1
            exit 1
        fi
        if ! extract_archive; then
            log_end_msg 1
            exit 1
        fi
        log_end_msg 0
    ;;
    stop)
        log_begin_msg "Stopping $PROG"
        if ! initialise; then
            log_end_msg 1
            exit 1
        fi
        if ! update_archive; then
            log_end_msg 1
            exit 1
        fi
        log_end_msg 0
    ;;
    force-update)
        initialise || exit 1
        force_update
        exit $RET
    ;;
    restart|force-reload|status)
        exit 0
    ;;
    *)
        echo "Usage: $0 {start|stop|restart|force-reload|force-update|status}" >&2
        exit 1
    ;;
esac

exit 0