As you might know, if disk partitions containing Oracle datafiles are not aligned with the underlying storage system, then some I/O’s can suffer from some overhead as they are effectively translated in two I/O’s.

If you want more info, google for “EMC disk alignment” and you’ll find plenty of information, explaining the issue.

Update 28-03-2013: I wrote a follow-up for this post describing the same thing for Linux (Red Hat / CentOS / OEL) versions 6. For that, you might want to jump straight to the new post as this one gets a bit outdated 😉

One example is http://www.vmware.com/pdf/esx3_partition_align.pdf for Vmware ESX version 3.x.

In short: If you create partitions in Intel based Operating Systems, then by default, the first partition will start at an offset of 15 x 512 byte blocks (equals 7680 bytes) – which does not match typical SAN storage systems that use 4K or 8K disk chunks. A write to a block crossing the boundary will cause 2 writes (plus some partial reads) in the disk backend (and the remote copy if you use remote storage mirroring) and will sometimes cause an extra cache slot to be allocated. Performance improvement when changing to the right alignment can be between 5 and 15% depending on workloads and other configuration settings.

Recent Linux distributions will sometimes already do this by default, if that is the case, make sure it actually does so (see end of this article) and you probably don’t have to change anything.

Now the way most documentation explains how to resolve this in Linux is, in my opinion, too complex, you need to manually enter “fdisk”, go into expert mode, change the starting block mode etc. Not nice if you have to configure a few hundred Oracle ASM disks at once.

There is an easier way.

Here goes…

(assuming you have a completely empty disk and you only want to create exactly one aligned partition, i.e. for Oracle ASM)

  • Check if your linux system has the command “sfdisk”. I bet most linux systems will have it installed by default.
  • Make sure you know the linux device name of the disk (such as /dev/sdk)
  • Enter the command:
echo "128,," | sfdisk -uS /dev/sdk

Note the command will fail if there is already a partition (so it’s reasonably safe). This is what the output looks like on my system:

Checking that no-one is using this disk right now ...
OK
Disk /dev/sdk: 1044 cylinders, 255 heads, 63 sectors/track
Old situation:
Units = sectors of 512 bytes, counting from 0
Device Boot    Start       End   #sectors  Id  System
/dev/sdk1             0         -          0   0  Empty
/dev/sdk2             0         -          0   0  Empty
/dev/sdk3             0         -          0   0  Empty
/dev/sdk4             0         -          0   0  Empty
New situation:
Units = sectors of 512 bytes, counting from 0
Device Boot    Start       End   #sectors  Id  System
/dev/sdk1           128  16771859   16771732  83  Linux
/dev/sdk2             0         -          0   0  Empty
/dev/sdk3             0         -          0   0  Empty
/dev/sdk4             0         -          0   0  Empty
Warning: no primary partition is marked bootable (active)
This does not matter for LILO, but the DOS MBR will not boot this disk.
Successfully wrote the new partition table
Re-reading the partition table ...
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)

Explanation:

sfdisk will read from “stdin” any commands it has to perform. To work around having to enter everything manually by ourselves we use “echo” to feed the commands directly into sfdisk. From the man page of sfdisk we can find out how sfdisk accepts commands:

sfdisk reads lines of the form <start> <size> <id> <bootable> <c,h,s> <c,h,s>

And using the -uS options we tell sfdisk to use sizes of sectors (of 512 bytes each) instead of cylinders or anything else.

As we want to use the full size of the disk we leave that field empty and let sfdisk figure it out. The id will be default (Linux partition). If you want something else then read the man page and you’ll find it. We ignore also the bootable and disk cylinders/heads/sectors parameters (they are optional).

The disk will be aligned exactly at 64KB offset (8 chunks of 8K which fits nicely with either EMC CLARiiON or EMC Symmetrix).

Sometimes you might want another alignment value. Common is one megabyte (2048 sectors). The command would then be:

echo "2048,," | sfdisk -uS /dev/sdk

To verify disk alignment:

sfdisk -uS -l <disk>

Example:

Here is the partition overview of my small Oracle RAC cluster.

[root@oradb1 ~]# listasm
#dev     scsi lun ASMVol    SizeMB
/dev/sda    0   0 -            101
/dev/sdb    1   0 -              9
/dev/sdc    1   1 ASM1        8189
/dev/sdd    1   2 ASM2        8189
/dev/sde    1   3 -           1019
/dev/sdf    1   4 -           1019

sda is the boot disk, sdb contains Oracle binaries, sdc/sdd are ASM volumes and sde/sdf are cluster resources / voting disks.
Let’s look at the boot volume.

[root@oradb1 ~]# sfdisk -uS -l /dev/sda

Disk /dev/sda: 2088 cylinders, 255 heads, 63 sectors/track
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/dev/sda1   *        63    208844     208782  83  Linux
/dev/sda2        208845  33543719   33334875  8e  Linux LVM
/dev/sda3             0         -          0   0  Empty
/dev/sda4             0         -          0   0  Empty

You can see that sda1 is mis-aligned at 63 sectors. I don’t really care as the boot (OS) disk in Linux will not cause much I/O anyway. The LVM volume is also misaligned at 208845 sectors. I only keep OS stuff in there so don’t care.
Now let’s check the ASM disks.

[root@oradb1 ~]# sfdisk -uS -l /dev/sdc

Disk /dev/sdc: 1044 cylinders, 255 heads, 63 sectors/track
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/dev/sdc1           128  16771859   16771732  83  Linux
/dev/sdc2             0         -          0   0  Empty
/dev/sdc3             0         -          0   0  Empty
/dev/sdc4             0         -          0   0  Empty

Nicely aligned at 64K (128 sectors) !

Let’s take a look at another Linux server that I installed with Ubuntu Server 10.10 recently.

root@silverstone:~# sfdisk -uS -l /dev/sda

Disk /dev/sda: 48641 cylinders, 255 heads, 63 sectors/track
Warning: extended partition does not start at a cylinder boundary.
DOS and Linux will interpret the contents differently.
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/dev/sda1   *      2048    499711     497664  83  Linux
/dev/sda2        501758 781422591  780920834   5  Extended
/dev/sda3             0         -          0   0  Empty
/dev/sda4             0         -          0   0  Empty
/dev/sda5        501760 781422591  780920832  8e  Linux LVM

You can see that on this system, even the boot volume is aligned at 1 Megabyte (2048 sectors). So some modern Linux distros will remove the burden of doing this yourself.

Let’s see what happens if I accidentally try to overwrite an existing partition.

[root@oradb1 ~]# echo "128,," | sfdisk -uS /dev/sdc
Checking that no-one is using this disk right now ...
BLKRRPART: Device or resource busy

This disk is currently in use - repartitioning is probably a bad idea.
Umount all file systems, and swapoff all swap partitions on this disk.
Use the --no-reread flag to suppress this check.
Use the --force flag to overrule all checks.

For those geeks who still think this is not enough, here the real proof.

 dd if=/dev/sdc bs=512 count=130 | xxd -c 32

000ffc0: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000  ................................
000ffe0: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000  ................................
0010000: 0182 0101 0000 0000 0000 0080 bc9c b1ab 0000 0000 0000 0000 0000 0000 0000 0000  ................................
0010020: 4f52 434c 4449 534b 4153 4d31 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000  ORCLDISKASM1....................
0010040: 0000 100a 0000 0103 4441 5441 5f30 3030 3000 0000 0000 0000 0000 0000 0000 0000  ........DATA_0000...............
0010060: 0000 0000 0000 0000 4441 5441 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000  ........DATA....................
0010080: 0000 0000 0000 0000 4441 5441 5f30 3030 3000 0000 0000 0000 0000 0000 0000 0000  ........DATA_0000...............

You can see that the ASM volume starts at offset 0x10000 which equals 65536.

Hope this makes your life a bit easier! Needless to say that you can put the given commands in a simple script to make it even easier 🙂

Update 1

My colleague Erik Zandboer has an excellent explanation of the alignment problem on his blog. You can find it here and here. Or search for keyword “alignment” on his site: http://www.vmdamentals.com/?tag=alignment

Also, I found that the “cfdisk” command shows weird behavior in CentOS 6.0 (probably also in Red Hat version 6). You might have to use the “–force” option to make it work in those Linux distributions. The drawback of this is that using that option does not prevent overwriting existing partitions. Be careful! (or write a script to prevent mistakes).

Loading

How to set disk alignment in Linux
Tagged on:                     

Comments are closed.