As you might know, if disk partitions containing Oracle datafiles are not aligned with the underlying storage system, then some I/O’s can suffer from some overhead as they are effectively translated in two I/O’s.
If you want more info, google for “EMC disk alignment” and you’ll find plenty of information, explaining the issue.
Update 28-03-2013: I wrote a follow-up for this post describing the same thing for Linux (Red Hat / CentOS / OEL) versions 6. For that, you might want to jump straight to the new post as this one gets a bit outdated 😉
One example is http://www.vmware.com/pdf/esx3_partition_align.pdf for Vmware ESX version 3.x.
In short: If you create partitions in Intel based Operating Systems, then by default, the first partition will start at an offset of 15 x 512 byte blocks (equals 7680 bytes) – which does not match typical SAN storage systems that use 4K or 8K disk chunks. A write to a block crossing the boundary will cause 2 writes (plus some partial reads) in the disk backend (and the remote copy if you use remote storage mirroring) and will sometimes cause an extra cache slot to be allocated. Performance improvement when changing to the right alignment can be between 5 and 15% depending on workloads and other configuration settings.
Recent Linux distributions will sometimes already do this by default, if that is the case, make sure it actually does so (see end of this article) and you probably don’t have to change anything.
Now the way most documentation explains how to resolve this in Linux is, in my opinion, too complex, you need to manually enter “fdisk”, go into expert mode, change the starting block mode etc. Not nice if you have to configure a few hundred Oracle ASM disks at once.
There is an easier way.
(assuming you have a completely empty disk and you only want to create exactly one aligned partition, i.e. for Oracle ASM)
- Check if your linux system has the command “sfdisk”. I bet most linux systems will have it installed by default.
- Make sure you know the linux device name of the disk (such as /dev/sdk)
- Enter the command:
echo "128,," | sfdisk -uS /dev/sdk
Note the command will fail if there is already a partition (so it’s reasonably safe). This is what the output looks like on my system:
Checking that no-one is using this disk right now ... OK Disk /dev/sdk: 1044 cylinders, 255 heads, 63 sectors/track Old situation: Units = sectors of 512 bytes, counting from 0 Device Boot Start End #sectors Id System /dev/sdk1 0 - 0 0 Empty /dev/sdk2 0 - 0 0 Empty /dev/sdk3 0 - 0 0 Empty /dev/sdk4 0 - 0 0 Empty New situation: Units = sectors of 512 bytes, counting from 0 Device Boot Start End #sectors Id System /dev/sdk1 128 16771859 16771732 83 Linux /dev/sdk2 0 - 0 0 Empty /dev/sdk3 0 - 0 0 Empty /dev/sdk4 0 - 0 0 Empty Warning: no primary partition is marked bootable (active) This does not matter for LILO, but the DOS MBR will not boot this disk. Successfully wrote the new partition table Re-reading the partition table ... If you created or changed a DOS partition, /dev/foo7, say, then use dd(1) to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1 (See fdisk(8).)
sfdisk will read from “stdin” any commands it has to perform. To work around having to enter everything manually by ourselves we use “echo” to feed the commands directly into sfdisk. From the man page of sfdisk we can find out how sfdisk accepts commands:
sfdisk reads lines of the form <start> <size> <id> <bootable> <c,h,s> <c,h,s>
And using the -uS options we tell sfdisk to use sizes of sectors (of 512 bytes each) instead of cylinders or anything else.
As we want to use the full size of the disk we leave that field empty and let sfdisk figure it out. The id will be default (Linux partition). If you want something else then read the man page and you’ll find it. We ignore also the bootable and disk cylinders/heads/sectors parameters (they are optional).
The disk will be aligned exactly at 64KB offset (8 chunks of 8K which fits nicely with either EMC CLARiiON or EMC Symmetrix).
Sometimes you might want another alignment value. Common is one megabyte (2048 sectors). The command would then be:
echo "2048,," | sfdisk -uS /dev/sdk
To verify disk alignment:
sfdisk -uS -l <disk>
Here is the partition overview of my small Oracle RAC cluster.
[root@oradb1 ~]# listasm #dev scsi lun ASMVol SizeMB /dev/sda 0 0 - 101 /dev/sdb 1 0 - 9 /dev/sdc 1 1 ASM1 8189 /dev/sdd 1 2 ASM2 8189 /dev/sde 1 3 - 1019 /dev/sdf 1 4 - 1019
sda is the boot disk, sdb contains Oracle binaries, sdc/sdd are ASM volumes and sde/sdf are cluster resources / voting disks.
Let’s look at the boot volume.
[root@oradb1 ~]# sfdisk -uS -l /dev/sda Disk /dev/sda: 2088 cylinders, 255 heads, 63 sectors/track Units = sectors of 512 bytes, counting from 0 Device Boot Start End #sectors Id System /dev/sda1 * 63 208844 208782 83 Linux /dev/sda2 208845 33543719 33334875 8e Linux LVM /dev/sda3 0 - 0 0 Empty /dev/sda4 0 - 0 0 Empty
You can see that sda1 is mis-aligned at 63 sectors. I don’t really care as the boot (OS) disk in Linux will not cause much I/O anyway. The LVM volume is also misaligned at 208845 sectors. I only keep OS stuff in there so don’t care.
Now let’s check the ASM disks.
[root@oradb1 ~]# sfdisk -uS -l /dev/sdc Disk /dev/sdc: 1044 cylinders, 255 heads, 63 sectors/track Units = sectors of 512 bytes, counting from 0 Device Boot Start End #sectors Id System /dev/sdc1 128 16771859 16771732 83 Linux /dev/sdc2 0 - 0 0 Empty /dev/sdc3 0 - 0 0 Empty /dev/sdc4 0 - 0 0 Empty
Nicely aligned at 64K (128 sectors) !
Let’s take a look at another Linux server that I installed with Ubuntu Server 10.10 recently.
root@silverstone:~# sfdisk -uS -l /dev/sda Disk /dev/sda: 48641 cylinders, 255 heads, 63 sectors/track Warning: extended partition does not start at a cylinder boundary. DOS and Linux will interpret the contents differently. Units = sectors of 512 bytes, counting from 0 Device Boot Start End #sectors Id System /dev/sda1 * 2048 499711 497664 83 Linux /dev/sda2 501758 781422591 780920834 5 Extended /dev/sda3 0 - 0 0 Empty /dev/sda4 0 - 0 0 Empty /dev/sda5 501760 781422591 780920832 8e Linux LVM
You can see that on this system, even the boot volume is aligned at 1 Megabyte (2048 sectors). So some modern Linux distros will remove the burden of doing this yourself.
Let’s see what happens if I accidentally try to overwrite an existing partition.
[root@oradb1 ~]# echo "128,," | sfdisk -uS /dev/sdc Checking that no-one is using this disk right now ... BLKRRPART: Device or resource busy This disk is currently in use - repartitioning is probably a bad idea. Umount all file systems, and swapoff all swap partitions on this disk. Use the --no-reread flag to suppress this check. Use the --force flag to overrule all checks.
For those geeks who still think this is not enough, here the real proof.
dd if=/dev/sdc bs=512 count=130 | xxd -c 32 000ffc0: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 ................................ 000ffe0: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 ................................ 0010000: 0182 0101 0000 0000 0000 0080 bc9c b1ab 0000 0000 0000 0000 0000 0000 0000 0000 ................................ 0010020: 4f52 434c 4449 534b 4153 4d31 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 ORCLDISKASM1.................... 0010040: 0000 100a 0000 0103 4441 5441 5f30 3030 3000 0000 0000 0000 0000 0000 0000 0000 ........DATA_0000............... 0010060: 0000 0000 0000 0000 4441 5441 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 ........DATA.................... 0010080: 0000 0000 0000 0000 4441 5441 5f30 3030 3000 0000 0000 0000 0000 0000 0000 0000 ........DATA_0000...............
You can see that the ASM volume starts at offset 0x10000 which equals 65536.
Hope this makes your life a bit easier! Needless to say that you can put the given commands in a simple script to make it even easier 🙂
My colleague Erik Zandboer has an excellent explanation of the alignment problem on his blog. You can find it here and here. Or search for keyword “alignment” on his site: http://www.vmdamentals.com/?tag=alignment
Also, I found that the “cfdisk” command shows weird behavior in CentOS 6.0 (probably also in Red Hat version 6). You might have to use the “–force” option to make it work in those Linux distributions. The drawback of this is that using that option does not prevent overwriting existing partitions. Be careful! (or write a script to prevent mistakes).