Ext4 recovery

From Granizada

Jump to: navigation, search

Dan Shearer

version 0.2

Last edited March 2010


Contents

Linux ext4 Filesystem Recovery

ext4 Not Ready As of March 2010 (For Most People)

For most people, ext4 is the wrong choice for data that matters. Some people have little choice, they have to have ext4 - but you know who you are, and you should know the risks already.

ext4 was released as a stable filesystem in December 2008 with Linux kernel 2.6.28. ext4 has many incremental improvements over ext3. ext4 as declared stable (stable means not adding new features, not what you might think it means) by the Linux kernel team and is shipped as a default filesystem in Fedora 11 (2.6.29) and Ubuntu 9.10 (2.6.20). ext4 has had regular reports of spontaneous corruptions.

Many people have suggested this is unusually bad and shocking: in fact it is not, it is merely that core filesystem changes are so rare that the racial memory of the changeover tends to vanish inbetween times (the ext3 changeover was a decade ago.) This page makes some suggestions about what to do if you have corruption troubles. The only bad but not really surprising thing is that Fedora and Ubuntu decided so quickly to make ext4 the default.

What is ext4?

ext4 is not a completely new filesystem. It adds features to ext3 filesystems, just as ext3 added the journalling feature to ext2 filesystems. The upgrade path from ext3 to ext4 is similar to that from ext2 to ext3: you run tune2fs and it adds the new features. This incremental approach is one of the ways that the Linux filesystem developers reduce risk for all users.

Filesystems are very tricky software engineering and there aren't many examples around of long-term filesystem evolution. If you're interested in an alternative approach, look at the development of UFS2 on BSDs which is more conservative than most Linux distributions, but meant that BSD users lacked essential features for years after Linux had them.

ext4 seems likely to be a very sound replacement for ext3 for some years to come. Right now (March 2010) my opinion is that ext4 has an extremely low failure rate, but since I don't advise it for any important data unless you desperately the new features ext4 introduces. If you're willing to take a punt on a desktop system then by all means, just install your Linux distribution with the default settings and keep your data somewhere else on the network.

Another Reason to Avoid ext4

The defaults in ext4 and in distributions don't seem to match optimum expected defaults. ext4 comes pre-configured to be safe-but-not-fast, in fact there are tuning options that can make it just as safe but faster. That's what you get for using a stable but new filesystem.

The technical detail is that ext4 has barriers enabled by default and therefore write caching. However it is possible to switch off write caching and barriers and get a faster result. Presumably this discrepancy will go away as ext4 becomes more optimised, however, for the moment unless you want to play with mount parameters, don't use ext4 where you want good performance under conditions of many small writes.

Kinds of Problems

There are many kinds of corrupted filesystem, including those caused by faulty hardware. The kinds of corruption I'm talking relate to an ordinary ext4 filesystem suddenly failing. Since 2.6.28 this sort of behaviour has been reported many times, and ext4 bugs have been squashed to fix it. Problems often seem to arise around corrupted group descriptors. An example is discussed on the ext4 list here: http://markmail.org/message/jsd6kwkh5ecq7xag . I have seen two other ext4 corruption problems, both complaining about group descriptors.

In the three cases I have seen these corruptions they are also accompanied by a corrupted superblock.

Caveat

ext4 recovery is very like ext3 or ext2 recovery, however:

The version of e2fsprogs matters

e2fsprogs contains the ext* recovery tools you need.

The version matters because ext4 is so new, and the e2fsprogs on your favourite recovery CD may be too old. A reasonable bootable CD to use is http://www.sysresccd.org/ , however according to the e2fsprogs release notes there are reasons why you should have a 2010 release (1.41.10 or higher). To get this you need the latest unstable Ubuntu/Debian/Fedora/Gentoo or build from source yourself. The first distribution to have this version available on a released bootable CD will be Ubuntu in the 10.04 Lucid LTS release.

Specifically, as of March 2010, if you use any released version of Knoppix or Debian to attempt recovery you will not have much joy.

Recovery Technique

First step, stop. Transfer the contents of the corrupted device somewhere else using a block-level transfer tool, dd works fine (including over ssh.) Check that the copy you made appears to be identically corrupted :-) Now make that copy read-only, or immutable, or whatever else will reduce the risk of you accidentally writing to it.

Get a list of backup superblocks:

 dumpe2fs /dev/YOUR_DEVICE | grep Backup

You can then specify one of the backup superblocks to fsck like this:

 fsck.ext4 -b YOUR_BACKUP_BLOCK_NUMBER /dev/YOUR_DEVICE

For a spontaneous group descriptor corruption in ext4, that should do the trick.

IF NOTHING ELSE WORKS for example you have a genuine reason for corruption such as a hard crash, you can try the following:

 mkfs.ext4 -S /dev/YOUR_DEVICE

This will replace all your superblocks with correct data, assuming the blocksize is guessed correctly (the default is correct for most systems.) If you need to use this, please read the man page on -S first. Don't blame me!

Personal tools
Navigation