10 April 2013
I recently wanted to test the speed of disk attached to a Linux server I was using, as it seemed somewhat slower than I expected. I have never had a need to test disk speeds before and I thought it would be an easy thing to test. Turns out a straight speed test is fairly easy, once you figure out what you want to test.
One thing I knew, but had never investigated in detail is that there are several ways to write a file to disk.
On Linux, if you mount an ext3 filesystem with no options, the OS will use any free memory to buffer file contents in memory. That makes it faster to access a file that was previously read. It also makes it faster to write to a file - the OS can write the file contents into memory, and then it can lazily stream the data to disk after the write call has completed.
This means that after a write to a file has completed, the data may not yet have made it to disk and could be lost if the machine suddenly lost power.
You can see this effect in action if you run the free command on Linux:
$ free -m
total used free shared buffers cached
Mem: 24082 23654 428 0 373 7430
-/+ buffers/cache: 15849 8232
Swap: 2047 553 1494
This machine has 24GB of RAM, and at first look, it would appear that almost all the memory is used. However, according to the cached column, 7430MB of memory is being used to cache file system buffers. As more memory is required on the machine by other processes, it will free this memory automatically, ensuring the machine doesn't run out of memory.
The Linux kernel provides a system call, fsync(), that forces the contents of a file cached in memory to be written to the underlying disk. That means that to ensure the data is safely on disk and not partially written, a write to a file needs to be followed by an fsync() call.
It is possible to open files in different modes, such as direct, that forces writes to bypass the cache or dsync, that uses the cache, but forces an fsync() call before the write returns.
Now we know there are different ways to write a file, we need a tool that allows data to be written to a file to compare the speed of the different approaches.
This is where the dd command comes in. It allows a data to be written to a file in various modes and ways, reporting the transfer speed when it completes.
$ dd if=/dev/zero of=writetest bs=8k count=131072
131072+0 records in
131072+0 records out
1073741824 bytes (1.1 GB) copied, 1.10978 seconds, 968 MB/s
960MB/s - that is pretty fast! Well, that is because this test wrote the file into memory, and the contents were not actually on disk when this call completed. Therefore this test didn't really test the speed of the disks in the machine at all.
Before moving onto a more realistic test, a definition of the fields passed to dd is required:
The last test proved nothing about the speed of the disk the file was supposedly written to, but earlier I mentioned the need to use a call to fsync() to force the file onto the disk. Luckily dd gives us a way to do this with the conv option:
$ dd if=/dev/zero of=writetest bs=8k count=131072 conv=fsync
131072+0 records in
131072+0 records out
1073741824 bytes (1.1 GB) copied, 18.1336 seconds, 59.2 MB/s
Now I have a more realistic disk speed of 59MB/s, which is likely to be the top speed this disk can write at.
The conv=fsync parameter changed the behavior of dd to make a call to fsync() before it completes writing the file, forcing all the data to disk.
It is also possible to call dd with conv=fdatasync which could be slightly faster for small files, but is about the same when writing a 1GB file in this test:
$ dd if=/dev/zero of=writetest bs=8k count=131072 conv=fdatasync
131072+0 records in
131072+0 records out
1073741824 bytes (1.1 GB) copied, 18.0217 seconds, 59.6 MB/s
In the test above, I made 131,000 small writes to a file, but the OS buffered these write in memory until the fsync() call wrote all the data to disk as a large write. In some applications, you need to do many small writes and know that each of them is securely stored on disk. This is why my test used a block size of 8kb. My disks were being used to store Oracle database files, and it works on a block size of 8kb. So while my disk can write at almost 60MB/s, how does it do with 8kb writes that must be synchronized to disk after each write? The dd command gives us a way to test this too:
$ dd if=/dev/zero of=writetest bs=8k count=131072 oflag=sync
The oflag=sync tells dd to open the output file with the sync option, which means that every write to the file must be synced to disk. The OS still buffers the writes in memory, but it must flush them to disk before the write call returns. In this case, that means about 131,000 writes are going to be issued to disk, which is going to be slow:
dd if=/dev/zero of=writetest bs=8k count=131072 oflag=sync
66324+0 records in
66324+0 records out
543326208 bytes (543 MB) copied, 66.9737 seconds, 8.1 MB/s
With small synced writes my top speed has drop to 8.1MB/s - much slower than the top speed of the disk.
Another option is to turn on directio, which skips the OS buffering of the writes, and puts them straight onto disk:
$ d if=/dev/zero of=writetest bs=8k count=131072 oflag=direct
131072+0 records in
131072+0 records out
1073741824 bytes (1.1 GB) copied, 34.6159 seconds, 31.0 MB/s
Now my write speed is a more impressive 31.0MB/s, so it looks like direct IO is much faster than the sync option. From what I've read, direct IO should store your data just as safely as sync mode, but I am not 100% certain on that one.
It is pretty simple to test the speed of writing a large file with the dd command, but you need to know about the subtle options available when writing a file to actually test the speed of a storage device.