BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

The 4 KiB Sector Performance Issue

by Michael Prokop on Mar 29, 2010 |

If you are using disks from Western Digital which contain the string "EARS" in the model name you might have suffered from poor performance with those disks already. The most likely cause of this problem is that Western Digital ships its new consumer disks with Advanced Format Technology (PDF). Normally disks store user data with a physical sector size of 512 bytes; the Advanced Format Technology of Western Digital uses data sectors of 4096 bytes (:= 4 KiB) each. Alignment of data on the disk is essential to get the most out of the hardware. With a wrong alignment setup on disks the amount of necessary reads/writes multiplies - leading to poor performance. As it's just a matter of time until other vendors will ship disks with non-512-bytes sectors as well you should be aware of this issue.

To quote Linux kernel programmer Theodore Ts'o:

It turns out this is much more difficult than you might first think - most of Linux's storage stack is not set up well to worry about alignment of partitions and logical volumes. [...] This kind of alignment is important if you are using any kind of hardware or software RAID, for example, especially RAID 5, because if writes are done on stripe boundaries, it can avoid a read-modify-write overhead.

The sector size of 512 bytes is an assumption that can be found in the hardware layer (like controllers) as well as in software (drivers, partitioning software,...). To avoid problems and provide backward compatibility the Western Digital drives lie about their actual physical sector size. Instead of reporting the physical 4 KiB sectors to the upper layers the firmware emulates the 512 sectors internally. This brings up the mentioned performance issue as soon as the upper layers aren't aligned accordingly for 4 KiB sectors. As a consequence misaligned and partial writes add additional read-modify-write overhead on those 512 bytes logical and 4 KiB physical sector disks.

Windows versions since Vista create the first partition starting at sector 2048 so alignment for the 4 KiB disks is fine. But older Windows versions as well as older versions of well known partitioning software on Linux tend to create the first partition starting at sector 63 by default. That's where the performance issue shows up: 63 can't be de clearly divided by 8 (4096 bytes with 512 byte granularity). Windows users can align the data on the affected disks using Western Digital's tool, while users of different operating systems should check and verify the partition table layout as part of integration tests and modify them accordingly if necessary.

In any case make sure alignment of your data is fine on each involved layer, starting from the partition table, throughout the filesystem and including Software RAID and Logical Volume Management - to get the most out of your hardware.

Further details around this issue are available, including the ATA 4 KiB sector issues page on the Linux ATA Wiki. Red Hat's engineer Karel Zak gives a summary of the behaviour of partitioning and filesystem utilies in a mail to the Linux kernel mailinglist and his Red Hat collegue Mike Snitzer wrote a document titled I/O Limits: block sizes, alignment and I/O hints. Oracle's Linux engineer Martin K. Petersen wrote a nice paper titled Linux & Advanced Storage Interfaces which is also worth reading.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

sector 63 can be divided by 8 by Robert MacLean

I am no expert in this space but as I understand it sector 63 divided by 8 is 8, because sectors are 0 indexed. In other words sector 63 is the 64th sector and 64 divided by 8 is 8.

The performance issue as I understand it from the Linux ATA Wiki, comes from the fact it is storing multiple logical sectors in one physical sector and you are relying on the hard disk firmware to handle the mapping.

A modest suggestion by Rich Harkins

A (VERY) quick and dirty hack might be to trick the device driver for SDDs to simply add one sector to the partitioning scheme. Lots not to like but it might make things work better until a good solution to the stack could be found.

Re: sector 63 can be divided by 8 by Tracy Nelson

It doesn't look like it's the mapping per se, it sounds like it's more an issue with the fact that writes are scheduled on a per (logical)-sector basis. So if you write a logical sector, you wind up writing the whole physical sector, which takes 8x longer, since it's 8x bigger. It can actually take even longer than that; if the physical sectors aren't in the cache, the disk has to read the physical sector(s), modify them, then write them back out. I imagine the disk controller cache is now effectively smaller too, if they're caching on a physical sector basis.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

3 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT