Archive for August, 2010

Btrfs and the 2k file problem – tests and my experience so far

Tuesday, August 3rd, 2010

I first heard about btrfs a few years ago. I also heard about how great it would be: some people call this the future linux mainstream filesystem with ext4 being just a temporary solution until btrfs becomes ready for prime-time. I have also heard about how it would still take quite some time before it becomes stable. In the meantime it seems to have become quite stable and have its’ bunch of early adopters which seem to be quite happy with it. It was also supposed to have its’ “disk format might change” tag removed in version 2.6.34 of the linux kernel which obviously was postponed for some reason.

In the light of these events I decided to give it a try but first I wanted to make sure I knew what I was getting into. So after some basic research on the net you will easily find out about the btrfs 2k problem. I was intrigued by this because I know that all filesystems have a problem (bigger or smaller) when it comes to small files and particularly when it comes to files smaller that their block size. Since people complained about how 80% or so of the space would be wasted on btfs with 2k files but never took the time to perform the same test on other filesystems for comparison I decided to do my own tests.

One word of warning though: I performed these tests for my own use and decided to share them in case anyone would find them of any use – not the other way around. So these tests are by no means scientific, accurate or whatever.

The Tests

I have been using xfs for several years and I have been quite satisfied. So I obviously chose to run the tests for xfs and also for ext4 which became (or is about to become) the new linux de facto default.

In addition to the 2k test (which I suspected to be the worst case scenario for btrfs) I decided to also perform a similar test with additional file sizes:

– 1k – it’s even smaller than 2k and you we can see if the 2k is a worst case for btrfs (and others) or the problem is getting worse as the files get even smaller

– 4k – filesystems have a tendency to use 4k as default block size nowadays so this one should be a maximum space efficiency scenario

– 6k – to see if the problem is related to the block size difference (is the 2k problem on top of the 4k which is supposed to give us maximum efficiency) or lies rather somewhere else.

I decided to leave all defaults in place because what is a default good for if not to give us the best compromise.

The first step was to create a 1GB file and a loop device for it:
dd if=/dev/zero of=/tst bs=1m count=1k
losetup /dev/loop0 /tst

Then I created a script that would create the filesystem and then the files:
mkfs.btrfs /dev/loop0
mount /dev/loop0 -o max_inline=0 /mnt/tmp
for i in $(seq 1000000); do dd if=/dev/zero of=/mnt/tmp/file_$i bs=4k count=1; done

I stop the script manually when it gives “out of space” errors. After that I see the occupied space:
du -sh --apparent-size /mnt/tmp

I used the “du” approach for convenience and  the “apparent-size” parameter apparently gives correct results regarding the size of the files rather than the occupied disk space. So the difference between the size of filesystem (1GB) and the whatever “du” gives would actually be wasted space.

Then I would unmount the system:
umount /mnt/tmp

And finally I prepare the script for the next run modifying parameters accordingly.

Test Results

The raw test results are these:

1k 2k 4k 6k
btrfs 282 200 590 519
xfs 235 464 922 714
ext4 66 130 258 386

All values are total file sizes (as reported by “du -sh –aparent-size”) expressed in MB.

I don’t know about you but I was pretty surprised by these results. If you just like numbers you can say that xfs is “za best” but still sucks for small files, btrfs is somewhere in between and ext4 is really hopeless.

But I’m not one of those people so looking more carefully I remember saying about using defaults. So I suspect ext4 to have a problem with the maximum number of inodes – the results clearly indicate this to me. So I expect ext4 to perform pretty well if tuned properly – unfortunately the people reporting 2k file problem don’t seem to like tuning (at least not for btrfs) and since I don’t really consider running ext4 I didn’t get to tuning land as I expected this to take quite a long time. Even more unfortunate is the fact that real filesystems might be a combination of files of all sizes and this might prove difficult to tune. Anyway I expected more of the defaults.

The winner still seems to be xfs which is the only one providing expected results (or close) even with the defaults. You can clearly see that its’ space problem comes from the block size but it’s still coping well.

Although btrfs seems to have better results than ext4 I don’t know how they might be tuned to get better. I tried using “max_inline=0” (as suggested by Chris Mason) with the 4k size and unfortunately I got the same result. I didn’t have time to test with other file sizes but even if it would help with th 2k problem I doubt that it would help beyond the 4k result.

I personally would trade space for speed or features for the 2k (or less) scenario in some situations. What I’m more worried about is the very poor 4k result since btrfs uses 4k blocks as default as far as I know.

Some Test Peculiarities

During the tests I noticed that btrfs reported sporadic “out of space” errors before giving up entirely. I suspect it tries to optimize and squeeze more files before it gives up but unfortunately this process does make some writes to fail in the meanwhile. This could be the problem reported here.

Xfs seems to have its’ own optimizations because “df” reports some occupied space value now and less later after writing some more files. However no writes fail in the meantime.

Real World Impressions

I decided to give btrfs a try despite these results. I first installed it on my laptop and then on my root and var desktop partitions. On my laptop I have a rather old 5400 RPM hdd and on my desktop I have 2 x 7200 RPM rather new hdds in software raid0. Laptop sequential hdd throughput is somewhere around 45 MB/s while desktop throughput can get up to 90 MB/s for a single hdd. I use Arch Linux and yaourt package manager and running a package search (yaourt -S pkgname) took about 4 seconds on laptop btrfs and 18 seconds on desktop raid0 xfs. I was very impressed by this. Of course I ran the command just after a reboot in order to be sure that those files are not in cache. A “du -s” on some large folder seems to be much faster on btrfs as well so I suspect that it btrfs trully shines in metadata handling. On the other hand these scenarios always seemed to be the weaknesses of xfs compared to any other filesystem.

If you would like to tune your btrfs you would better think twice. I tried using bigger nodesize and sectorsize. Everything works fine until an umount or a sync: they both rest forever. This seems to be the problem described here.

The wasted space in the real world doesn’t seem to be that much. I didn’t run accurate measurements but the values reported by “df”, “btrfs-show” and such seem to be pretty close to the ones that I used to have on xfs. On the other hand random access throughput seems to have dropped a little compared to xfs. Again I have not tested this – it’s just an impression.

Another thing you should know before trying btrfs with raid0 is that you do need an initrd which would basically run “btrfs device scan”. Otherwise the kernel will not be able to find the btrfs raid setup and it will end with a kernel panic. This is true even if btrfs is compiled in-kernel but it’s not true if you run bt

A Big Thank You

When testing and reporting problems people seem to forget that there are literally years of work going into a project such as btrfs and yet btrfs is given away for free (as in “no charge”). So I would like to say a big “thank you” to all developers that have worked on this project. I do believe that the future is bright for btrfs.