Linux Server Partitioning
Setting up a Linux server involves making some decisions that will last as long as the server does, especially partitioning. There are things you can do at partitioning time that make for a safer and more recoverable server. The following partitioning suggestions assume a server is meant to be a solid system that is slow to change except for the most essential updates.
Dan 18:51, 5 June 2007 (CST)
If you install Linux/Unix in one big partition, a lot of the time everything will work well. Especially if your system is a laptop. But for servers, one big partition is a very bad idea. Pity some automated installation systems use one big partition by default!
What risks can clever partitioning protect you from?
Risk of corruption is minimised if you have the minimum number of files open by the minimum number of processes. It is easy to have many more files open on a Linux system than you really need. The partitioning suggestions below help reduce number of files and processes.
Programs running wild are always a possibility. For example a program might continuallly write to /tmp or a directory under /var or /home and so fill up the entire disk. This might be triggered by an attacker but more likely just an unfortunate bug or administrator error: printing systems, buggy web languages and incorrectly configured databases can all do this. As can mistaking Gb for Mb in the wrong context! Sensible partitioning can limit the damage.
Risk of Having to Reinstall the OS in order to move to a virtualised server environment, or to upgrade the server hardware or software. Partitioning based on function with the minimum level of access required means you can quite probably pick up an ordinary server and move it to a Xen partition, or another server, or even move the data and logs to a completely different hardware architecture.
Risk of fragile backups is reduced by partitioning. You can point your data, log, program or other backup script at an entire partition instead of having to specify directories with a list of exceptions. The exceptions are always a potential fragile point.
Classify Your Data
Server hard disks contain six broad types of data. Classifing by access type:
- programs (read, execute)
This covers everything in the /sbin and /bin trees. In normal running these files will never be written to. It is rare for a server to be configured to automatically apply updates unless it is part of a very tightly controlled corporate network, so usually an administrator will be involved anyway and can arrange for a limited period of time when system programs are writable. (In fact all partition types can potentially be written to during an operating system upgrade, so the administrator needs to think about that.)
- swap (read, write)
Swap is of course used for virtual memory management, but it can also be used for software suspend (a feature worth considering in a server; install extra hardware without stopping the database server!) Over time, swap space also contains snippets of data including things like plain-text versions of passwords. So not only is encrypting swap a good idea (required for software suspend) but the entire swap partition can be a vital part of smooth system recovery and reconstructing an incident later on.
- logs (append, mv, read)
Nearly all activity to do with logs is appending to existing files. Very occasionally a new file is created or a file renamed during a log rotation. Log monitoring software and sometimes humans read the logs.
- user data (read, write)
This will be the largest part of your disk space.
- temporary data (read, write)
This should be a good size, it is often required for system administration purposes.
- boot programs and associated data (read and execute once)
Aside from system upgrades, these programs don't even need to be available during normal running of the system. You can leave them unmounted.
If you are unsure about what kinds of files go where or want more detail, have a look at the Linux Filesystem Hierachy. Beware too of local differences. For example if you are supporting a commercial CAD installation chances are you have Gigabytes of programs in /opt and these certainly don't need to be written to during a system upgrade.
Notice how little need there is for execute permission. So, we want to deny execute access to as many partitions as possible using the noexec mount parameter. Until mid-2006 there were very simple ways  to avoid the noexec restriction, but it still provided protection against most automated and/or uninformed human attackers. noexec always was a good idea, and it is getting more bulletproof as time goes on.
There is no provision here for compiling, running and installing programs outside standard system upgrade facilities, for example under /usr/local as many people do. If you need to do this you will want execute permission, and possibly on /tmp as well.
Try to separate data with different kinds of access by putting them in different partitions. Then you can treat them in different ways to minimise risk and complexity.
You might not implement all of the following, but consider some at least.
|Mount point||Mount options||Comments|
|/||rw for boot, then ro||You do this in the boot script in /etc/init.d/ for your distribution|
|/var||read/write, noexec||/var doesn't matter in an emergency, unlike /var/log. symlink /tmp to /var/tmp|
|/var/log||read/write, noexec||try to make all applications use syslog not write to /var/log directly|
|/home||read/write, noexec||Instead, think about /data as explained in the following table|
This is a very simple system that can save you a lot of trouble compared to having everything in one or two big partitions. If you have a busy server or one disk is slow you can mount with noatime. The only impact of this is less information for forensic-type activity.
A lot of distributions selectively delete /tmp on reboot. This is potentially a denial of service (rm can take a long time for a very deep tree) and mkfs may be more appropriate.
|Mount point||Mount options||Comments|
|/||read/write for boot, then read-only||must be mounted in order to boot, but only a limited number of directories under it are really needed|
|/boot||doesn't matter||umount after boot. Ability to umount is the only modern reason to have a /boot partition.|
|/usr||read only, noexec|
|/var||read/write, noexec||/var doesn't matter in an emergency, unlike /var/log|
|/var/log||read/write, noexec||try to make all applications use syslog not write to /var/log directly.|
|/tmp||read/write, noexec||You might just symlink this to /var/tmp|
|/data||read/write, noexec||Use symlinks from / and keep /home and other variable data here apart from /var|
You may need to remount /boot, /sbin, /usr, /var and /var/log during software updates. With apt it is possible to do this automatically, with a pre-install and post-install command set up in /etc/apt/apt.conf.
Wherever nosuid appears above you should also add nosuid in the /etc/fstab line.
There are other important mount options (such as noatime) but that's beyond the scope of this article.
There are two benefits to logging through syslog rather than directly to log files:
- You can also send all log output to a remote log server, and in the event of a catastrophe syslog will often work after local storage has frozen, perhaps giving you information as to what when wrong and why.
- If /var/log vanishes due to some local catastrophe, applications are unlikely to crash! Since it is only syslog that talks to /var/log, and syslog will keep going anyway, you are probably avoiding corrupt log files.
The way to remount a partition in a different state is to say mount -o remount NEWFLAGS /fs where NEWFLAGS is ro, rw and other mount flags.
For log files, you may consider using the ext2/3fs-specific attribute 'append-only' with 'chattr +a /var/log/somefile'. If you run 'logrotate', there is a 'prerotate' and 'postrotate' clause that can reset this bit. This will prevent anyone reading the log files including analysis packages, but that might be fine if you have a central syslog server to point your analysis tools at. This is a simple technique to help fend off cracker activity which often wants to rewrite logs.
Along similar lines you may wish to consider immutability, chattr +i somefile, which means that even root cannot modify the file.
This is not to do with partitioning, but while you're editing /etc/fstab you should mount /proc nosuid,noexec as well (there was a security bug targetting this in mid-2006.)
When Something Bad Happens
Let's say your server is very slow.
Check space available on all partitions with df -h. Has /tmp or /var or /var/log filled up? You might not care in principle about /tmp, but if zapping /tmp causes an application to fail which then corrupts a file in /data, you do care. So be careful when cleaning up, but at least this situation should be completely recoverable without anything drastic (like rebooting.)
In a more serious situation maybe:
- there are badly hung processes
- a hardware driver is only partly responding
- nearly everything has died
... but you still have one working session at the physical root console or an ssh session that is still responsive.
- Your goal is to safeguard /data (ie /data/home etc, or /home if you use the simpler partitioning approach.)
- sync. If the machine crashes entirely then at least you've done this much.
- umount /data . If this works, your data is safe. You can now work on the rest of the system, or even reboot in the rare event this is required.
- If umount refuses, use lsof to see what processes have files open for write access on /data/xxx (or /home). Try to shut down these processes, and failing that, kill them. If you can't kill them, they are almost certainly waiting on local or network I/O. Look at /proc/PID/wchan for whereabouts the process is in the kernel. You might try disabling a particular I/O device.
- Try umount, and umount -f.
- Etc. From here on you need advanced server ressucication techniques, which is another article.
You now have minimised the chances of an application corrupting data through getting upset when something vanishes underneath it. If it is a malicious attacker, there is now a lot less damage they can do, probably.
- ↑ Current Linux kernels and system libraries are much more careful to honour noexec, but the workaround used to be "/lib/ld-linux.so.2 /readonly-path/file-to-execute".
The Nix Partitioning Guide has some background comments.
| || This content is licensed under the Creative Commons|
Attribution ShareAlike License v. 2.5:
|GFDL: Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". (shearer.org uses but does not currently recommend the GDFL and here's the explanation why. )|