floppy-disksBecause of the many discussions and confusion around the topic of partitioning, disk alignment and it’s brother issue, ASM disk management, hereby an explanation on how to use UDEV, and as an extra, I present a tool that manages some of this stuff for you.

The questions could be summarized as follows:

  • When do we have issues with disk alignment and why?
  • What methods are available to set alignment correctly and to verify?
  • Should we use ASMlib or are there alternatives? If so, which ones and how to manage those?

I’ve written 2 blogposts on the matter of alignment so I am not going to repeat myself on the details. The only thing you need to remember is that classic “MS-DOS” disk partitioning, by default, starts the first partition on the disk at the wrong offset (wrong in terms of optimal performance). The old partitioning scheme was invented when physical spinning rust was formatted with 63 sectors of 512 bytes per disk track each. Because you need some header information for boot block and partition table, the smart guys back then thought it was a good idea to start the first block of the first data partition on track 1 (instead of track 0). These days we have completely different physical disk geometries (and sometimes even different sector sizes, another interesting topic) but we still have the legacy of the old days.

If you’re not using an Intel X86_64 based operating system then chances are you have no alignment issues at all (the only exception I know is Solaris if you use “fdisk”, similar problem). If you use newer partition methods (GPT) then the issue is gone (but many BIOSes, boot methods and other tools cannot handle GPT). As MSDOS partitioning is limited to 2 TiB (http://en.wikipedia.org/wiki/Master_boot_record) it will probably be a thing of the past in a few years but for now we have to deal with it.

Wrong alignment causes some reads and writes to be broken in 2 pieces causing extra IOPS. I don’t have hard numbers but a long time ago I was told it could be an overhead of up to 20%. So we need to get rid of it.

ASM storage configuration

ASM does not use OS file systems or volume managers but has its own way of managing volumes and files. It “eats” block devices and these block devices need to be read/write for the user/group that runs the ASM instance, as well as the user/group that runs Oracle database processes (a public secret is that ASM is out-of-band and databases write directly to ASM data chunks). ASM does not care what the name or device numbers are of a block device, neither does it care whether it is a full disk, a partition, or some other type of device as long as it behaves as a block device under Linux (and probably other UNIX flavors). It does not need partition tables at all but writes its own disk signatures to the volumes it gets.

[ Warning: Lengthy technical content, Rated T, parental advisory required ]

ASM detects volumes (typically after boot) by scanning a path called the ASM DISKSTRING. If the disk string is set to /dev/whatever then ASM will scan /dev/whatever/* for block devices that are read/writable. If they contain valid ASM signatures it will try to access them and compose ASM disk groups from them. If they have no valid signatures then the volumes become “candidate” disks, i.e. They may be used to create new ASM volumes and/or disk groups.

Linux, by default, places detected disk volumes under /dev/ with root:disk as user/group (for example, /dev/sdq) and not read/writeable for anyone else than root user/group. So Oracle cannot handle this as it typically does not run under root id. Also, if ASM scans /dev/ it will have to scan a lot of devices because /dev/ is home to not just all disks, but also all other block (and character) devices.

So we need to present ASM volumes under a path that matches the ASM diskstring and has different userid/group and permissions. One way of doing this is using the “mknod” command to create references (inodes) to the disks under /dev. So for example /dev/sdq and /dev/oracleasm/disks/myvol both point to a Linux device with major ID 8 and minor id 80. They are essentially different names for the very same thing and can have different permissions (so /dev/sdq can have root:root and myvol can have oracle:asm as ownership and still being the same device). The problem is that a) it’s a very manual intensive process, b) it’s error prone and c) not guaranteed to be consistent across reboots.

ASMlib

Oracle therefore created a tool called ASMLib that consisted of a custom kernel module and a set of command line tools. The kernel module (driven by the CLI) would scan all devices in /dev/* after boot and if it found the right signatures it would clone those (by creating additional device inodes using mknod) typically using diskstring /dev/oracleasm/disks. ASMLib has a few flaws:

  • Can be very slow during boot because it scans ALL devices it can for signatures
  • The kernel module needs to be recompiled against the right kernel every time (this can be done partially dynamic but it’s sensitive to errors and the module is not part of the validated kernel source code – making it tricky to maintain)
  • Oracle dropped support for SuSE/RHEL 6/CentOS 6 so customers upgrading from 5.x were stuck unless using Oracle Linux (but Oracle now seems to support Red Hat again)
  • Requires the disk to be partitioned first (i.e. /dev/sdq is not accepted by ASMlib, but /dev/sdq1 is)

The partition requirement  probably caused partitioning of ASM disks to be the de facto standard these days. Some people claim that an unaware Linux sysadmin will be tempted to create a partition table and file system on any raw device he gets his hands on – and if this happens to be a disk already in use for ASM, it can ruin your whole day (and more). If it already has a partition the rumor goes that the admin will think the disk is used and not be tempted to just create a file system on the (in his eyes) empty partition 1 (I have my serious doubts on this – a low-brain admin will destroy anything in his path anyway – but alas).

Alignment (again)

So if we use partitions then we need to make sure the partition is aligned at a multiple of the element size of our storage device (whatever that is). For EMC VMAX and VNX the disk “track” size is 8K (track quoted because these days it is a far cry from what the real track would look like on the real spinning platter – there are many indirection layers). So we need to align at least in multiples of 8K (16 sectors). As 8K is not accepted by the partition tools we need to go larger – plus, for other reasons it makes sense to go larger (think storage cache slot size, raid stripe width etc). EMC recommends 64K (128 sectors) or a multiple thereof.

DBAs prefer an offset of 1 MiB (2048 sectors) because it improves the chance that someone overwrites a disk with a partition label without touching the real Oracle data (and ASM keeps copies of the disk header on multiple offsets on the disk so wiping out 1MB from block zero is often recoverable (still ugly).

How can you figure out if a disk is partitioned? (assuming all Linux here, my UNIX skills are pretty rusty these days):

# parted /dev/sdq unit s print
Number Start End Size Type File system Flags
1 2048s 155647s 153600s primary ext4

The 2048s means the first partition starts here at 2048 sectors = 1024K = 1MiB. If it shows 128, all is good, if it shows 63, trouble. Note that it only matters for disks that do lots of random IO (so I don’t care about the boot disk for example).

Let’s create a partition on an empty disk using classic tools. I’m using CentOS 6.5 here (most recent non-beta distro 100% compatible with Red Hat):

[root@dbhost ~]# fdisk /dev/sdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x959383a2.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
switch off the mode (command 'c') and change display units to
sectors (command 'u').
Command (m for help):
n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-130, default 1): 1
Last cylinder, +cylinders or +size{K,M,G} (1-130, default 130):
Using default value 130
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
[root@dbhost ~]# parted /dev/sdb unit s print
Model: VMware, VMware Virtual S (scsi)
Disk /dev/sdb: 2097152s
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 63s 2088449s 2088387s primary

Who said modern Linux versions had this problem solved? 🙂

My blog post “Disk alignment reloaded” goes a bit deeper in this stuff. But the command to partition correctly is this:

# parted /dev/sdb mklabel msdos ; parted /dev/sdb mkpart primary 1m 100%

Note that ASMlib does not partition devices itself. The linux admin has to do that and then use asmlib commands (“oracleasm createdisk”) to move the volumes for ASM usage.

Linux UDEV

Is there a better way? Yes. Linux offers a facility since (at least) RHEL version 5 (probably before) called UDEV. UDEV is a mechanism that manipulates the way the kernel detects all sorts of devices and presents them to the system for further usage. It has a built-in set of rules that define things like persistent device naming, ownerships and the like. The nice thing of UDEV is that you can add or override rules. Some people have already written on how to use UDEV for Oracle (ASM), for example Frits Hoogland, and many more (searching Google finds many of them).

As said, a disk volume by default is presented as /dev/sdXX (where XX is the next available character between a-z and keeps going with aa, ab,..when there’s more than 26 of them). Owner:group is root:disk and only root user or members of group “disk” may read/write to all of them.

If we add a custom rule we can alter this presentation for a block device. Example:

[root@dbhost ~]# cat /etc/udev/rules.d/99-asm.rules

OWNER="grid", GROUP="asmdba", MODE="0660", ENV{DEVTYPE}=="disk", KERNEL=="sd*", ENV{ID_SERIAL}=="36000c29f8e2de32a6d10cf8e69d2816f", NAME="oracleasm/vol1"

Explanation: custom rules have to go in /etc/udev/rules.d and have extension .rules. The number (99) indicates the order of processing (99 is last).

You see assignments (single = sign) and comparisons (double == sign).

Assignments tell UDEV what to do with a device once it matches all comparisons.

So in this case, it tells a device to have owner “grid” and group “asmdba” with mode 0660 if the device detected is a disk, part of kernel “sd” driver (typically SCSI) and has the serial as shown. It also gives the device a new path and name (oracleasm/vol1 instead of de/sdsomething).

The detection is not based on the SCSI target and lun numbers or something similar (these can change). SCSI devices have unique identifiers (say, world wide names) that can be used for persistent naming. If a device matches the generic type AND has the correct identifier, it will become /dev/oracleasm/vol1 with the permissions as mentioned. It will not appear any longer as /dev/sdq (or whatever it would be named normally).

So we kill 3 birds with one stone:

  • We completely bypassed the partitioning problem, Oracle gets a block device that is the whole LUN and nothing but the LUN
  • We assigned the correct permissions and ownership and moved to a place where ASM only needs to scan real ASM volumes (not 100s of other thingies)
  • We completely avoid the risk of a rookie ex-Windows administrator to format an (in his eyes) empty volume (that actually contains precious data). An admin will not look in /dev/oracleasm/ to start formatting disks there

And the best of all: we have an astonishing extra whopping Megabyte of disk space for our tablespaces, due to not needing boot sector and partition tables! Yaaay!

But first a caveat with VMware. VMware creates such SCSI signatures on each disk, however, by default it does not let the guest OS know about them. So if you list the scsi_id of a volume under VMware, it will show empty. For VMware to present disk IDs to the guest OS, you need to enable a config parameter “disk.enableUUID = true” in each VM’s config (VMX) file. That’s all (manually edit the VMX file or in VSphere you can use the VM configuration GUI to add an entry).

Introducing asmdisks

We done yet? Nope. Manually maintaining the asm rules is hard work, you need to scan then copy-paste these Ids in the rules file and then give it names etc. Can be done but not nice. Browsing all the blogposts that describe in great detail how to create the UDEV rules file, I wondered why nobody seemed to have written a tool to do all the hard work for you. Saying goes, a good Unix administrator is lazy enough to automate repetitive tasks but everyone still suggests to manually mess around with copy/paste of SCSI id strings and all that mess.

So I wrote a script called “asm” (in a moment of sheer inspiration) that mimics ASMlib commands but actually generates these rules for you. I now present you the 1.0 final version. It’s packaged as an RPM package called “asmdisks” and has man pages and a few other goodies inside. Dependencies are set correctly, so for example, the packages required to read scsi ids  are installed automatically if you install the RPM with a decent package manager (YUM).

Short demo:

Install RPM package

UPDATE: check my software page for the correct yum command (path) to the asmdisks package.

[root@dbhost ~]# yum install asmdisks
…
…
Dependencies Resolved
=====================================================================================
Package Arch Version Repository Size
=====================================================================================
Installing:
asmdisks noarch 1.0-1 test 14 k
Installing for dependencies:
bc x86_64 1.06.95-1.el6 base 110 k
lsscsi x86_64 0.23-2.el6 base 38 k
parted x86_64 2.1-21.el6 base 606 k
sysstat x86_64 9.0.4-22.el6 base 230 k

Transaction Summary
=====================================================================================
Install 5 Package(s)
Total download size: 998 k
Installed size: 3.3 M
Is this ok [y/N]: y
…
Installed:
asmdisks.noarch 0:1.0-1
Dependency Installed:
bc.x86_64 0:1.06.95-1.el6 lsscsi.x86_64 0:0.23-2.el6
parted.x86_64 0:2.1-21.el6 sysstat.x86_64 0:9.0.4-22.el6

Complete!

List available disks and creating one for ASM

[root@dbhost ~]# asm disks
/dev/sda [2:0:0:0] 20.00 GB partitioned
/dev/sdb [2:0:1:0] 1.00 GB available
/dev/sdc [2:0:2:0] 1.00 GB available
/dev/sdd [2:0:3:0] 4.00 GB available
/dev/sde [2:0:4:0] 4.00 GB available
/dev/sdf [2:0:5:0] 2.00 GB available
/dev/sdg [2:0:6:0] 1.00 GB available
[root@dbhost ~]# asm createdisk vol1 /dev/sdb
[root@dbhost ~]# asm disks
/dev/sda [2:0:0:0] 20.00 GB partitioned
/dev/sdb [2:0:1:0] 1.00 GB configured as /dev/oracleasm/vol1
/dev/sdc [2:0:2:0] 1.00 GB available
/dev/sdd [2:0:3:0] 4.00 GB available
/dev/sde [2:0:4:0] 4.00 GB available
/dev/sdf [2:0:5:0] 2.00 GB available
/dev/sdg [2:0:6:0] 1.00 GB available

Note that the numbers between brackets are the SCSI driver, device, target and lun IDs (similar to “lsscsi” output fyi). Let’s look at /dev/sdb:

[root@dbhost ~]# ls -ald /dev/sdb
ls: cannot access /dev/sdb: No such file or directory
[root@dbhost ~]# ls -ald /dev/oracleasm/vol1
brw-rw---- 1 grid asmdba 8, 16 Jun 24 08:55 /dev/oracleasm/vol1

It’s missing from /dev (no way to mess up by admin!) and reappeared under /dev/oracleasm/vol1 with correct ownership. ASM can pick it up and make a disk group out of it.

Let’s take a look under the covers:

[root@dbhost ~]# cat /etc/asmtab
# /etc/asmtab - configuration file for asmdisks
# Definitions for IORate testing - volumes under /dev/iorate will have root:iops @ mode 0601
PATH=iorate:root:iops:0660
#
# This file keeps track of udev disk mappings for asmdisk(1)
# You should normally not have to edit this file directly
# Use asm(1) instead.
#
# On each line:
#
# label type identifier
# where
# label: diskstring/volume name (default diskstring is oracleasm and can be omitted)
# type: one of scsi, part or mapper (scsi=entire SCSI disk, part=scsi disk partition, mapper=linux disk mapper device)
# label: scsi_id, scsi_id:partition, mapper_name
#
# Ownerships and permissions can be specified for a diskstring:
# PATH=diskstring:owner:group:mode
# default is oracleasm:grid:asmdba:0660
#
# example:
# vol1 scsi 36000c29f825cd85b5fcc70a1aadebf0c # entire SCSI disk
# vol2 part 36000c298afa5c31b47fe76cbd1750937:1 # partition 1 of entire SCSI disk
# vol3 mapper mpathb # /dev/mapper/mpathb (multipath device)
# iorate/test1 mapper iops-vol1 # LV vol1 on VG iops, will be mapped as /dev/iorate/test1
# -----------------------------------------------
vol1 scsi 36000c29f8e2de32a6d10cf8e69d2816f

We see an entry with the volume name (vol1), the type (will explain that later), and the disk ID. The asm script detects the ID automatically and configures it. Not a single time do you need to manually detect the id or copy it all over the place.

[root@dbhost ~]# cat /etc/udev/rules.d/99-asm.rules
SUBSYSTEM!="block", GOTO="asmudev_end"
ENV{DEVPATH}=="*/block/sda", GOTO="asmudev_end"

OWNER="grid", GROUP="asmdba", MODE="0660", ENV{DEVTYPE}=="disk", KERNEL=="sd*", ENV{ID_SERIAL}=="36000c29f8e2de32a6d10cf8e69d2816f", NAME="oracleasm/vol1"

LABEL="asmudev_end"

The rules file looks very similar to my earlier example, with a few additions: “subsystem” is only defined once (cleans up the mess a bit) but more important: /dev/sda is EXCLUDED from any manipulation. This prevents typos that cause the boot volume missing from /dev/ which makes the system unbootable (no dataloss but nasty to fix – believe me, I accidentally did that once and immediately created this protection against it 😉 )

What if we want to remove the volume?

[root@dbhost ~]# asm deletedisk vol1
[root@dbhost ~]# ls -al /dev/oracleasm/vol1
ls: cannot access /dev/oracleasm/vol1: No such file or directory
[root@dbhost ~]# ls -al /dev/sdb
brw-rw---- 1 root disk 8, 16 Jun 24 09:02 /dev/sdb

‘Nuff said. Beware of doing this to disks that are in use by ASM. Makes Oracle sad.

Let’s create a few more volumes and present different ways of showing the configuration.

[root@dbhost ~]# asm createdisk myvol /dev/sdc
[root@dbhost ~]# asm createdisk yourvol /dev/sdd
[root@dbhost ~]# ls -al /dev/oracleasm/
drwxr-xr-x 2 root root 80 Jun 24 09:04 .
drwxr-xr-x 19 root root 4020 Jun 24 09:04 ..
brw-rw---- 1 grid asmdba 8, 32 Jun 24 09:04 myvol
brw-rw---- 1 grid asmdba 8, 48 Jun 24 09:04 yourvol
[root@dbhost ~]# asm list
myvol   1.00 GB [-] sdc
yourvol 4.00 GB [-] sdd
[root@dbhost ~]# asm disks
/dev/sda [2:0:0:0] 20.00 GB partitioned
/dev/sdb [2:0:1:0]  1.00 GB available
/dev/sdc [2:0:2:0]  1.00 GB configured as /dev/oracleasm/myvol
/dev/sdd [2:0:3:0]  4.00 GB configured as /dev/oracleasm/yourvol
/dev/sde [2:0:4:0]  4.00 GB available
/dev/sdf [2:0:5:0]  2.00 GB available
/dev/sdg [2:0:6:0]  1.00 GB available

Could we rename a disk? Haven’t implemented that but you can do it yourself by editing asmtab:

[root@dbhost ~]# vi /etc/asmtab
yourvol scsi 36000c2905b5f9379248f904459f8b449

Change to:

myvol2 scsi 36000c2905b5f9379248f904459f8b449

Rescan asmtab for changes:

[root@dbhost ~]# asm scandisks
[root@dbhost ~]# ls -al /dev/oracleasm/
total 0
drwxr-xr-x 2 root root 100 Jun 24 09:06 .
drwxr-xr-x 19 root root 4020 Jun 24 09:06 ..
brw-rw---- 1 grid asmdba 8, 32 Jun 24 09:06 myvol
brw-rw---- 1 grid asmdba 8, 48 Jun 24 09:06 myvol2
brw-rw---- 1 grid asmdba 8, 48 Jun 24 09:04 yourvol

You see that yourvol is not removed and it’s the same as myvol2. That’s an artifact of the Udev mechanism (when doing asm deletedisk I force delete in the script). After reboot it will be gone. You may also manually rm /dev/oracleasm/yourvol (but be careful).

Ready for some more magic? Here goes…

Say your DBA wants to use partitions instead of full volumes because…. Well just because. Legacy thinking. We can do that if they insist:

[root@dbhost ~]# asm disks
/dev/sda [2:0:0:0] 20.00 GB partitioned
/dev/sdb [2:0:1:0] 1.00 GB available
/dev/sdc [2:0:2:0] 1.00 GB configured as /dev/oracleasm/myvol
/dev/sdd [2:0:3:0] 4.00 GB configured as /dev/oracleasm/yourvol
/dev/sde [2:0:4:0] 4.00 GB available
/dev/sdf [2:0:5:0] 2.00 GB available
/dev/sdg [2:0:6:0] 1.00 GB available
[root@dbhost ~]# parted /dev/sde mklabel msdos
Information: You may need to update /etc/fstab.
[root@dbhost ~]# parted /dev/sde mkpart primary 1m 50%
Information: You may need to update /etc/fstab.
[root@dbhost ~]# parted /dev/sde mkpart primary 50% 100%
Information: You may need to update /etc/fstab.
[root@dbhost ~]# parted /dev/sde unit MiB print

Model: VMware, VMware Virtual S (scsi)
Disk /dev/sde: 4096MiB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 1.00MiB 2048MiB 2047MiB primary
2 2048MiB 4096MiB 2048MiB primary

Note that I deliberately created 2 partitions to show that it’s possible to use both on one disk as separate ASM volumes.

[root@dbhost ~]# asm createdisk sdep1 /dev/sde1
[root@dbhost ~]# asm createdisk sdep2 /dev/sde2
[root@dbhost ~]# asm disks
/dev/sda [2:0:0:0] 20.00 GB partitioned
/dev/sdb [2:0:1:0] 1.00 GB available
/dev/sdc [2:0:2:0] 1.00 GB configured as /dev/oracleasm/myvol
/dev/sdd [2:0:3:0] 4.00 GB configured as /dev/oracleasm/yourvol
/dev/sde [2:0:4:0] 4.00 GB partitioned
/dev/sdf [2:0:5:0] 2.00 GB available
/dev/sdg [2:0:6:0] 1.00 GB available
[root@dbhost ~]# asm list
myvol   1.00 GB [-] sdc
myvol2  4.00 GB [-] sdd
sdep1   1.99 GB [-] sde1
sdep2   2.00 GB [-] sde2
yourvol 4.00 GB [-] sdd

You see /dev/sde cannot be detected as a single ASM volume so it shows as “partitioned” but when listing the ASM volumes you see them both.

This might be handy when migrating from ASMlib configurations as well.

Still not done? Nope. Watch this… Say I have a VM on my laptop or a small old server at home with a few SATA disks. I would like to have many more ASM volumes than I have virtual or physical disks in the system. Is there a way?

[root@dbhost ~]# vgcreate asmvg /dev/sdf /dev/sdg
No physical volume label read from /dev/sdf
Physical volume /dev/sdf not found
No physical volume label read from /dev/sdg
Physical volume /dev/sdg not found
Physical volume "/dev/sdf" successfully created
Physical volume "/dev/sdg" successfully created
Volume group "asmvg" successfully created
[root@dbhost ~]# lvcreate -Ay -nlvol1 -L1G asmvg
Logical volume "lvol1" created
[root@dbhost ~]# lvcreate -Ay -nlvol2 -L1G asmvg
Logical volume "lvol2" created
[root@dbhost ~]# asm createdisk lvol01 /dev/asmvg/lvol1
[root@dbhost ~]# asm createdisk lvol02 /dev/asmvg/lvol2
[root@dbhost ~]# asm disks
/dev/sda [2:0:0:0] 20.00 GB partitioned
/dev/sdb [2:0:1:0] 1.00 GB available
/dev/sdc [2:0:2:0] 1.00 GB configured as /dev/oracleasm/myvol
/dev/sdd [2:0:3:0] 4.00 GB configured as /dev/oracleasm/yourvol
/dev/sde [2:0:4:0] 4.00 GB partitioned
/dev/sdf [2:0:5:0] 2.00 GB LVM Volume
/dev/sdg [2:0:6:0] 1.00 GB LVM Volume
[root@dbhost ~]# asm list
lvol01  1.00 GB [asmvg-lvol1] dm-7
lvol02  1.00 GB [asmvg-lvol2] dm-8
myvol   1.00 GB [-] sdc
myvol2  4.00 GB [-] sdd
sdep1   1.99 GB [-] sde1
sdep2   2.00 GB [-] sde2
yourvol 4.00 GB [-] sdd

Voila… A mix of raw disks, disk partitions and LVM logical volumes all under /dev/oracleasm to be used by ASM as you like. Note that for Oracle RAC you cannot use LVM volumes as they are not cluster aware. Other than that, no restrictions. Can ASMlib do that? 😉

I also made it work with multipath volumes (after installing device-mapper-multipath):

[root@dbhost ~]# asm disks
/dev/sda [2:0:0:0] 20.00 GB partitioned
/dev/sdb [2:0:1:0] 1.00 GB multipath (mpatha)
/dev/sdc [2:0:2:0] 1.00 GB configured as /dev/oracleasm/myvol
/dev/sdd [2:0:3:0] 4.00 GB configured as /dev/oracleasm/myvol2
/dev/sde [2:0:4:0] 4.00 GB multipath (mpathe)
/dev/sdf [2:0:5:0] 2.00 GB multipath (mpathf)
/dev/sdg [2:0:6:0] 1.00 GB multipath (mpathg)
/dev/dm-2 [mpatha] 2.00 GB available
/dev/dm-3 [mpathe] 8.00 GB partitioned
/dev/dm-4 [mpathf] 4.00 GB LVM Volume
/dev/dm-5 [mpathg] 2.00 GB LVM Volume

Haven’t tested Powerpath yet, will do as soon as I get the chance. But I don’t expect too much problems (might require a few script changes).

What if you want another diskstring? I have thought of that because of another reason: I was testing with IORate (a destructive IO load generator from EMC, that overwrites devices it has configured). IOrate is very useful but also dangerous for that reason. And normally it has to be run as root:root because otherwise it cannot access the volumes. But what if we used “asm” for that?

[root@dbhost ~]# asm createdisk iorate/iops1 /dev/sdb
[root@dbhost ~]# asm createdisk iorate/iops2 /dev/sdc
[root@dbhost ~]# ls -al /dev/iorate/
/dev/iorate/:
total 0
drwxr-xr-x 2 root root 80 Jun 24 09:36 .
drwxr-xr-x 21 root root 4260 Jun 24 09:36 ..
brw-rw---- 1 root iops 8, 16 Jun 24 09:36 iops1
brw-rw---- 1 root iops 8, 32 Jun 24 09:36 iops2

Here we created two test volumes for IO stress testing, under /dev/iorate, with group “iops”. If we create a user “iorate” with group “iops”, this user can now run the IO tests without root permissions (and thus risking severe dataloss). You can configure extra disk strings each with it’s own set of permissions.

Ever used IOStat to monitor ASM disks?

[root@dbhost ~]# iostat -xk
Linux 2.6.32-431.el6.x86_64 (dbhost) 06/24/2014 _x86_64_ (2 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.44 0.00 1.57 1.11 0.00 96.89

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
scd0 0.00 0.00 0.17 0.00 0.69 0.00 8.00 0.00 0.89 0.89 0.02
sdb 0.05 0.00 1.86 0.00 8.49 0.00 9.13 0.00 0.14 0.14 0.03
sda 7.33 1.49 17.91 0.76 204.05 2.53 22.13 0.04 2.10 1.42 2.65
sdd 0.00 0.00 0.93 0.00 3.71 0.00 8.00 0.00 0.13 0.13 0.01
...
...
...
dm-13 0.00 0.00 0.96 0.01 3.83 0.06 7.97 0.00 1.02 0.46 0.04
dm-14 0.00 0.00 0.92 0.01 3.42 0.04 7.39 0.00 0.24 0.24 0.02

Now how do you know which one maps to what ASM volume? Maybe this helps:

[root@dbhost ~]# asmstat -xk
Linux 2.6.32-431.el6.x86_64 (dbhost) 06/24/2014 _x86_64_ (2 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.37 0.00 1.30 0.90 0.00 97.43

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sdb    0.04 0.00 1.51 0.00 6.89 0.00 9.13 0.00 0.14 0.14 0.02
sda    5.95 1.21 14.54 0.64 165.66 2.11 22.11 0.03 2.10 1.42 2.15
myvol2 0.00 0.00 0.75 0.00 3.01 0.00 8.00 0.00 0.13 0.13 0.01
myvol  0.00 0.00 0.75 0.00 3.01 0.00 8.00 0.00 0.12 0.12 0.01
sde    0.04 0.00 4.95 0.00 20.53 0.00 8.30 0.00 0.16 0.13 0.07
sdf    0.04 0.00 1.28 0.00 5.92 0.00 9.26 0.00 0.12 0.11 0.01
sdg    0.04 0.00 0.62 0.00 3.28 0.00 10.62 0.00 0.16 0.16 0.01

Note that it only works with non-multipath full disk devices (no LVM or partitioned disks yet). This because asmstat is just a wrapper around iostat, and translation seems to be not straightforward for non-block devs. Might work on that in a future version.

# man asm

asm(1) asmdisks asm(1)

NAME
asm - tool for managing Oracle ASM devices via udev(7)

SYNOPSIS
asm

DESCRIPTION
asm is a replacement for the oracleasm command provided via Oracle ASMlib. It attempts to provide
similar functionality using a simple script and Linux UDEV rather than tweaking the kernel with an
add-on kernel module, complex configuration and binary files.
…
…

Man pages included.

Note that “asmdisks” requires RHEL or compatible (OEL/CENTOS) with versions >= 6.x.

UDEV works different under 5.x so the RPM refuses to install on version 5. Haven’t tested SuSE. Mileage may vary.

For the records: the RPM package is 14 Kilobytes (so it could run on a Commodore 64 😉 and the asm script itself has only 362 lines (written as bash script).

Want to try it out? I created a Linux YUM repository from which you can download the RPM (and another one that I will cover later). See “Downloads” page.

Update:

Made slight changes to the repository and to the downloads page on this blog, it seems to break the links that were there. Sorry for the inconvenience. Please check “Downloads” tab on the blog for more info.
Update 2014-10-24:

Revived the repository but renamed it to avoid conflict with another project I’m working on. Check software for more info.

Also I updated “asmdisks” to support bash-completion and non-default user/group for /dev/oracleasm/ disks (thanks Terry for pointing out this problem). I also included a better script header, readme/copyright info in the package (GPLv3) and fixed a minor bug dealing with multiple slashes in volume names.

Happy UDEVving and let me know what you think!

Loading

Fun with Linux UDEV and ASM: Using UDEV to create ASM disk volumes
Tagged on:                                         

34 thoughts on “Fun with Linux UDEV and ASM: Using UDEV to create ASM disk volumes

  • yum couldn’t find your package. Have you removed it? If not why can’t yum find it?

    1. Hi Terry,

      The repository was my first attempt in configuring such a thing on the web. Later I found I had to make changes to the repository due to another (conflicting) much larger project I’m working on. So I made an update to the RPM package that configures the web repository so that it will disable itself to avoid future conflicts (Can’t make an RPM package delete itself but the updated version contains no files: check with “rpm -ql outrun-release”)
      It’s safe to remove outrun-release but it does not hurt to leave it there as it holds no files anyway.

      The RPM package is still there but you have to install it directly using YUM. See http://dirty-cache.com/downloads/ or for now, you can enter (as root):

      # yum install http://milliways.xs4all.nl/public/rpms/asmdisks-1.0-1.noarch.rpm

      The path may change in the future so always check my downloads page.

      Thanks for your comment and feel free to drop me a message or comment if you have any issues.

  • I tried the package and worked fine, but one small problem 12c wants the group to be asmadmin, and tried to change in the rules file but it just went back to being root owned. Any ideas why?

    1. Hi Terry,

      The rules file gets updated if you use the asm command. 2 possible workarounds –
      Workaround 1) without changing asm script:
      – Enter a PATH option in /etc/asmtab (after the iorate entry for example)
      PATH=oracleasm:grid:asmadmin:0660
      – Append “oracleasm” before all devices so that the volumes are named oracleasm/disk1, oracleasm/disk2 etc:
      oracleasm/disk1 scsi 36000c2927d96084fab336839b9b7ca68
      – rescan the disks using “asm rescan”

      If you want to add a new device, do something like “asm createdisk oracleasm/disk3 ” and you should be good.
      Actually I think prepending “oracleasm” should not be needed so I will improve that in the next version of the tool. For now the prepend works fine but adds a bit of extra work.

      Workaround 2) change /usr/bin/asm script
      This breaks the asmdisks package. Often /usr/local/bin is first in $PATH so better is to copy /usr/bin/asm to /usr/local/bin/asm and edit it there.
      On line 219 you find:
      GROUP=${GROUP:-asmdba}

      Change it to
      GROUP=${GROUP:-asmadmin}

      and rerun “asm scandisks”.
      Note that existing devices will not be updated until you reboot or do a rm -f /dev/oraclasm/* and another rescan.

      Will make an update to the asm package when I get time so editing the script would no longer be needed.
      Let me know if this works for you!

      Thanks
      Bart

      1. Hi Bart,

        I tried option 1 using…

        oracleasm/crs mapper vg_ora12c1-asm_crs
        PATH=oracleasm:grid:asmadmin:0660

        but…

        oot@ora12c1 ~]# ls -la /dev/ora*
        total 0
        drwxr-xr-x 2 root root 60 Oct 4 13:30 .
        drwxr-xr-x 21 root root 3940 Oct 4 13:36 ..
        lrwxrwxrwx 1 root root 7 Oct 4 13:30 crs -> ../dm-5
        [root@ora12c1 ~]# asm list
        crs 4.00 GB [vg_ora12c1-asm_crs] dm-5

        I’m thinking of just failing back to before the above edits and reboot to see if the old ownership and groups come back so I can try a grid setup.

        Perhaps command line varbs for owner and group should be used so in 14x you don’t have to change the code much, nor edit any files.
        thanks,
        Terry

        1. Ah you’re using the LVM with logical volumes 🙂
          Did you try a reboot after making the changes?
          The permissions in this case are on /dev/mapper/* and not on /dev/oracleasm/* and managed by device mapper so safest way to get it right is reboot.
          Or if you’re willing to experiment a little, changing the VG offline and online again *might* work as well.

          Good luck
          Bart

          1. Hi Bart

            Yes I rebooted a number of times no chg. I can’t take the volume off line as ole 6.5 seems to want to take all of whatever drive you install on so it has swap home etc. I was using your example of partitioned disk.

            This was easier on windows just limit the size of c and create a extended partition with many logical partitions.

            Trying asmca to use the /dev/dm device then might try using the new afd.

            Terry

  • Something seems ro be pointing to the thin air
    root@ora12c1 ~]# ls -la /dev/dm-5
    brw-rw—- 1 grid asmadmin 252, 5 Oct 4 16:25 /dev/dm-5
    [root@ora12c1 ~]# ls -al /dev/oracleasm/*
    lrwxrwxrwx 1 root root 7 Oct 4 16:25 /dev/oracleasm/crs -> ../dm-5
    [root@ora12c1 ~]#

    root@ora12c1 ~]# ls -la /dev/dm-5
    brw-rw—- 1 grid asmadmin 252, 5 Oct 4 16:25 /dev/dm-5
    [root@ora12c1 ~]# ls -al /dev/oracleasm/*
    lrwxrwxrwx 1 root root 7 Oct 4 16:25 /dev/oracleasm/crs -> ../dm-5
    [root@ora12c1 ~]#

    1. Hi Terry,

      This looks like it works as it should?

      i.e. /dev/oracleasm/crs is a symlink to /dev/dm-5 and /dev/dm-5 has the correct permissions?

      I tried it myself and it seems to work just fine.
      You should have another symlink in /dev/vg_ora12c1/ btw (to keep the VG complete).

      FYI the symlink itself has owner/group root/root but doesn’t matter. The actual mapper device has grid/asmadmin like you wanted. So using asmca on /dev/oracleasm/* shoud work.

      BTW when I was implementing the feature to support mapper disks, I considered making it so that UDEV would create the logical volume in /dev/oracleasm/ but that would mean it would disappear from /dev/mapper – leaving an incomplete volume group. Decided to stick with the symlink method instead. Ideally I’d like to have a real block dev in both /dev/dm-* and /dev/oracleasm/* (with the same major/minor) but linux UDEV seems not to be able to do that. You can only have one real block dev and anything else must be symlinks.

      Regards
      Bart

    2. Terry,
      FYI – Check out the updated version. Added feature to add different default group. And some other bugfixes/improvements.

  • Thanks for this writeup Bart. Looking great. Hope I get to use it some time, over e.g. ASMlib.
    Thanks,
    Ed

  • Hi Bart,

    Coudn’t find the Package i guess you have removed it..please help me with the package.

  • In Oracle Linux 7 scsi_id has moved to /usr/lib/udev, also the NAME=”devicename” no longer renames the device in UDEV, you can use the SYMLINK=”symlink” to create a symlink instead.

    1. Thanks Michael!

      Currently I have no support yet for RHEL7 and compatibles but will keep this in mind when I start working on v7.
      BTW a bit strange to find a system related command elsewhere than under /bin, /usr/bin, /usr/sbin or /sbin…

      1. I used your asm tool to build 4 disk groups and it went well. Then I migrated those same grps to afd but the DM device ownership didn’t go to root still fine. So I checked asm and afd is loaded Figured use afd label to present 2 primary parts to afd and just migrate the old drive partitions to new afd.
        The first label operation went well if you remember to use root the 2nd succeed but immediately hung ole 6.5 solid. A reboot hung.

        I was wondering if
        1. If just power off the 2 HDD will Linux just move on And not load afd Dev driver for it
        2. Should I have used asmdisk to create udev entries for the 2 new partitions? The example seemed that was not needed just
        asmcnd afd_label asmdsknm /dev/sdb2

        Was all that was needed to alter the related disk group using ‘AFD:DAT2’.

        Seriously stuck in Seattle.

        1. Hi Terry,

          I haven’t tested the tool with AFD (assume you mean ASM filter driver). Maybe we can work together to see what’s needed to get it going.

          A few things that might help:

          – asmdisks is out of band, it does not do anything but create the 99-asm.rules file and trigger UDEV reloads. If you get stuck, you can delete the 99-asm.rules file, reload udev (“udevadm trigger”) and everything should be back to defaults. UDEV rules are a bit obscure but if you have trouble then manually editing the rules file should get you going. For AFD I don’t know what rules need to be defined, maybe google has some answers?

          – I burned myself once when accidentally creating an entry for the /dev/sda disk, caused the system to hang at boot. I now have a safeguard but that only works if the bootdisk is sda. If your bootdisk is something else and you try to manip the bootdisk you might need to boot in recovery mode, remove the rules file and restart. Maybe I should make the bootdisk protection a bit more dynamic in the next version…

          – I would really like to get the tool working with AFD, so feel free to drop me a mail with the contents of /etc/asmtab, the rules file and any other config items you think help resolving this. Probably means I have to setup a VM with AFD myself to get it going but I could make some time this week to give it a shot.

          Best regards from Holland!

          1. Thanks for the reply.

            I dont think its your udev defs they’re rock solid. Getting the system back up seems to need to get whatever afd udev entries out. I tried by booting in single usr mode but CD to the gridhome/bin and trying asmcmd afd_unlabel … But it keep saying that if couldn’t asmcmd. So I’m going to try running the grid bash profile retry asmcmd and failing that manual find and edit the afd entries, then try a reboot.

            As to the root cause of the problem 1. The afd config didn’t migrate the asm disks that used your udev defs to afd 2. I’m thinking because the afd assumes your using asmlib so bypasses defs instead of using the asm disk string and use the same code as mount to discover which disks to convert. None of the disks showed up in afd directory. 3. So as we say in America afd disks dont play well with others (raw defined disks) so because of 2 the code entered an unimagined use case where afd disks existed along side of raw volumes. The afd config says it creates udev entries so I’m not sure theres a use for asmdisk beyond getting oracle to fix 2 so that disks that were defined by it get converted. My guess is they won’t want write code to take out udev entries someone else created. I don’t work for oracle any more but I can try to run the above by someone there that works in asm and get back to you if he answers.

            So bottom line without a code fix those that have raw non asmlib disks can’t use afd, short of migrating to a new server that has afd defined disks.

            Thanks Terry

          2. Also you might try running the afd config on one of your Dev servers to confirm my findings.

          3. Hi Bart,

            I believe this is what is needed to take drives with your asm tool into afd…

            as grid crsctl stop crs
            xor crsctl stop has

            as root

            mirgrate the existing drives into afd

            asmcmd afd_label DAT_0000 /dev/mapper/vg_ora12c1-dat –migrate
            asmcmd afd_label RDO_0000 /dev/mapper/vg_ora12c1-rdo –migrate
            asmcmd afd_label FRA_0000 /dev/mapper/vg_ora12c1-fra –migrate
            asmcmd afd_label CRS_0000 /dev/mapper/vg_ora12c1-crs –migrate

            but the filter is disabled by default so enable on each disk

            asmcmd afd_filter -e ‘/dev/mapper/vg_ora12c1-fra’
            asmcmd afd_filter -e ‘/dev/mapper/vg_ora12c1-dat’
            asmcmd afd_filter -e ‘/dev/mapper/vg_ora12c1-crs’
            asmcmd afd_filter -e ‘/dev/mapper/vg_ora12c1-rdo’

            asmcmd afd_scan
            asmcmd afd_lsdsk

            dle instance.
            [root@ora12c1 ~]# asmcmd afd_lsdsk
            Connected to an idle instance.
            ——————————————————————————–
            Label Filtering Path
            ================================================================================
            FRA_0000 ENABLED /dev/mapper/vg_ora12c1-fra
            DAT_0000 ENABLED /dev/mapper/vg_ora12c1-dat
            RDO_0000 ENABLED /dev/mapper/vg_ora12c1-rdo
            CRS_0000 ENABLED /dev/mapper/vg_ora12c1-crs

            Then I used your asm tool to list and delete the old udev refs

            then as grid

            asmcmd dsset ‘AFD:*’
            [grid@ora12c1 ~]$ asmcmd dsget
            parameter:AFD:*
            profile:AFD:*

            to tell asm to use only afd disks in the asm disk search string.

            NB afd does not work with non afd (e. g. asmlib, raw drives) if you try to mix and match it will
            hang linux. Also the asmcmd afd_configure only mirgrates asmlib disks it will not migrate raw disks.

            crsctl start has | crs

            crsctl stat res -t
            ——————————————————————————–
            Name Target State Server State details
            ——————————————————————————–
            Local Resources
            ——————————————————————————–
            ora.CRS.dg
            ONLINE ONLINE ora12c1 STABLE
            ora.DAT.dg
            ONLINE ONLINE ora12c1 STABLE
            ora.FRA.dg
            ONLINE ONLINE ora12c1 STABLE
            ora.LISTENER.lsnr
            ONLINE ONLINE ora12c1 STABLE
            ora.RDO.dg
            ONLINE ONLINE ora12c1 STABLE
            ora.asm
            ONLINE ONLINE ora12c1 Started,STABLE
            ora.ons
            OFFLINE OFFLINE ora12c1 STABLE
            ——————————————————————————–
            Cluster Resources
            ——————————————————————————–
            ora.cssd
            1 ONLINE ONLINE ora12c1 STABLE
            ora.diskmon
            1 OFFLINE OFFLINE STABLE
            ora.evmd
            1 ONLINE ONLINE ora12c1 STABLE
            ora.utldb1.db
            1 ONLINE ONLINE ora12c1 Open,STABLE
            ——————————————————————————–

            SQL> select name,path from v$asm_disk;

            NAME
            ——————————
            PATH
            ————————————————————————————————————————————
            FRA_0000
            AFD:FRA_0000

            DAT_0000
            AFD:DAT_0000

            RDO_0000
            AFD:RDO_0000

            CRS_0000
            AFD:CRS_0000

            So there you have the steps you will need to get disks built under your asm tool / raw disks to migrate to afd. I’m guessing you’re going asm -afd createdisk? AFD creates its own udev entries so no need for that code path when -afd is specified Also i would recommend asm -afd migrate asm disk name to do all of the above steps or better read your files and don’t specify the drive name. CRS/HAS does need to be dpwn for the migration.

            I haven’t rebooted as of yet but with 30+C weather on the way i’m sure the srv will crash from the heat again soon.

    1. Hi Steve,

      It currently works with EMC PowerPath or Linux native multipath.
      I have plans to support more exotic devices (EMC DSSD, ScaleIO) as they show up a bit different than generic Linux scsi volumes.

      If you want to use it for a specific scenario then let me know and we can see what’s possible.

    1. Hi David,

      Thanks for that, I wasn’t sure as I haven’t tested it. But if the disks are normal SCSI disks (/dev/sd*) and they respond nicely to ‘scsi_id -g /dev/’ (i.e. they return a UUID) then it should work.

      Regards

      1. Bart,

        Thank you, for your prompt reply:
        No, it doesn’t get any out from using scsi disk. I get blank output. Only if I use blkid I get the outpur below. I wonder if I can use “blkid” instead.

        /dev/xvda1: UUID=”acf88335-279c-47a9-bdcd-368341819ad5″ TYPE=”ext4″
        /dev/xvda2: UUID=”VV0ARQ-YXDH-RLO9-N8PW-eY7i-2M3C-bibeBe” TYPE=”LVM2_member”
        /dev/xvdg1: UUID=”aQfv7b-OQfU-fFFb-wf76-eXPG-KRwD-PVQMZO” TYPE=”LVM2_member”
        /dev/xvdf1: UUID=”rtrtAN-dCLX-6Kgp-bMms-UXgm-TFL5-T73r25″ TYPE=”LVM2_member”

        -David

        1. Hi David,
          From the man page, ‘blkid’ looks to what’s actually on a disk instead of the disk metrics itself and therefore cannot be used. We need to have something that uniquely identifies the disk even if the entire disk is blank. /dev/sd* devices can use scsi_id for this purpose, my script also supports device-mapper (/dev/mapper/*), /dev/scini* (DelLEMC ScaleIO) and /dev/power* (DellEMC PowerPath) as DellEMC is who I work for, but I’d be happy to include other drivers.
          xvda probably means you’re using XEN hypervisor as this is what uses xvd* devices. Couple of things we can do here.

          – Use Linux device mapper devices (you should be able to create an ASM disk from /dev/ampper/- although that is indirect but you may find it useful for testing
          – See if you can use KVM on AWS (not sure if they do that) – KVM should provide proper /dev/sd (scsi) devices although not sure if they will respond nicely to scsi_id -g .. requests
          – I could add support for xvd* devices but not sure if that is useful now that it looks like AWS is moving from XEN to KVM anyway. I don’t have a XEN system available but might have in a few weeks as I will be involved in a customer POC using it.

          Regards

          1. Bart,

            I really appreciate your feedback. I will check and see if we can use kvm on aws, since this is also a poc moving our Oracle rac to aws and I was not aware of kvm. I will investigate more on this since I’m new into the aws arena.

            Thank you,
            -david

    1. hi David,

      Only maybe if you did it as IAAS, as aws keeps total North Korean style control on the hw, plus Oracle will NOT Support ASM on aws. I know this from having worked on RAC/aws and yes there is a mos note on not support asm on aws. do recommend an outside firm

  • Given AFD is the default grid / asm disk driver I would recommend upg to afd as part of 18c upg.
    You label / stamp the drives using asmcmd afd_label name drive before starting the install which creates the udev defs. This is possible as you unzip the media to GRID_HOME 1ST B4 running the grid install.

Leave a Reply to terryCancel reply