This blog is about how to setup ASM on Linux.
Note: This content is heavily borrowed from the excellent blog at http://blog.yannickjaquier.com/linux/device-mapper-multipathing-and-asm.html
Automatic Storage Manager is Oracle’s alternative to Veritas Volume Mgr and Veritas File System. It allows DBAs to manage their own diskspace. It also takes away some of the complexity that used to surround performance when the DB wrote to disks using drivers that were written mostly with filesystems and files in mind rather than tiny writes scattered across large, raw binary files and read in an equally randomly.
Features of ASM
What does this mean? ASM provides high-performance reads and writes to disk. It treats the disks as raw to bypass the caching and write-grouping features of the OS. This gets files onto and off of the disk faster.
- ASM (when configured with ASMlib) provides a consistent disk name.
Linux names disks based on the order that it sees them at boot time. In Linux, we are not guaranteed that the disk named /dev/sdg will always be known as /dev/sdg. If a disk in the middle dies or is slow to respond to the OS probe … oops it’s now a different disk. (Imagine you referred to your kids in the order they came downstairs in the morning: “Kid1 wants oatmeal…” By contrast Solaris maps /dev/dsk/c0t0d0 to a hardware path and does not change this unless you force it to redo the device mapping. New devices are added to the end and you can prune out unused devices without shifting all disk names).
- Files are distributed across disks to reduce device contentionSince ASM manages all the disks, it can put files where it wants to. When it decides where to put a file, it trys to write it to one of the disks that is not as heavily used.
- Redundancy What does this mean? You can configure a disk group to keep 1 (external redundancy), 2 (normal) or 3 (high-redundancy) copies of a file. For dual redundancy, each disks is allocated to one of two ‘failure’ groups and files are written to both failure groups (you can even over-ride this on a per-file basis). Another feature to increase uptime is that you can tell ASM that you plan to pull a disk (‘DROP DISK’) so it can try to copy off the data from that disk before you pull it.
- Cooked filesystems You can create cooked filesystems (ie. a typical filesystem with directories and files – not raw) from raw ASMspace. ASM even provides a filesystem type for this purpose – ACFS, but you can use other types of filesystems.
ASM Theory ASM accepts several types of diskspace to manage. You can give it an entire disk or LUN (with no partition table), a single partition, a file in an NFS mount or a logical volume (though this duplicates ASM functionality, thus it’s overhead). My recommended rule of thumb is to create a single partition on a disk and use this partition for ASM (more later). Disks are grouped into ‘disk groups’ to allow different features per disk group. One group may be have the high-redundancy feature (were dual copies are kept). Another group may composed of physically faster disks (a good place to store redo logs) or slower disks (good for FRA). A single DB may use more than one disk group. And more than one DB can store files in the same disk group. While ASM was written with binary files in mind, it was designed to store physical files that are used by databases – control files, SPFILEs and logs, bitmaps, OCR etc. ASM comes with a template for each of these filetypes so you get the best performance, redundancy etc. Files which live in ASM start with a plus and end with the disk-group name (eg. +data/orcl/controlfile/Current.256.541956473) In addition, you can create regular filesystems on top of ASM formatted with the ACFS or even an OS filesystem. Interesting features of ACFS: simultaneous mount on multiple nodes, automount at ASM startup, dynamic resizing, snapshots, FS cleanup by a healthy member in case a cluster member dies while writing to ACFS. Oracle strongly prefers that DB-related files are stored directly in ASM, not on an ACFS filesystem.
Best practices for ASM on Linux :
1. Let Linux manage your multipathing to your disks
As of DB 11.2.0.3, ASM does not have multi-pathing (managing multiple paths to a physical disk) built into it. This means we’ll need to use dm-multipath devices.
/etc/multipath.conf :
# Do not manage ram,raw,loopback,floppies,tape etc
# Do not multipath IDE devices or internal HP drives
blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z]"
devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
}
# user_friendly_names creates /dev/mpath*
# 'prio const' was 'prio_callout /bin/true'
# path_selector: bmarzins at redhat suggests using
# service-time instead of default round-robin
# checker_timer 180 obviates a similar rule in udev
defaults {
polling_interval 10
path_selector "service-time 0"
getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
prio const
path_checker readsector0
rr_min_io 100
failback immediate
user_friendly_names yes
checker_timer 180
}
Now enable and start the multipathd service
# chkconfig on multipathd
# service multipathd restart
# service multipathd status
Check that multipath is working. For each of your mpaths run :
[root@server1 ~]:# multipath -ll mpath0
mpath0 (3600601602b702d006218b7de8130e111) dm-13 DGC, RAID 5
[size=67G] [features=1 queue_if_no_path][hwhandler=1 alua][rw]
\_ round-robin 0 [prio=4][active]
\_ 3:0:0:0 sdae 65:224 [active][ready] | HBA1
\_ 3:0:1:0 sdat 66:208 [active][ready] |
\_ 1:0:0:0 sdf 8:5 [active][ready] | HBA2
\_ 1:0:1:0 sdp 8:240 [active][ready] |
In case it is not obvious what this shows: sdae, sdat, sdf and sdp are the same disk discovered four times due to four different paths to the disk. multipath presents mpath0 to the user and if you use it, multipath intelligently distributes reads/writes across the four paths. The RAID5 is because multipath has detected that, in fact, the disk is physically a RAID5 lun.
Notes:
If you have multiple nodes sharing the disks, it is smart to pick a ‘master’ node and once you have configured the disks, copy the multipath bindings file [ /var/lib/multipath/bindings ] from that node to the others to ensure that they all map the same way.
Also, if your /var directory is on a diff filesystem from your root, I suggest copying the multipath file to /etc and pointing the variable /etc/multipathd.conf:bindings_file at the new location in order to avoid a situation where the /var filesystem is not mounted when multipath starts, so it cannot read the bindings file.
2. Present partitions, not raw disks to ASM
Instead of giving the entire disk to ASM, create a partition on each of the multipath disks which allocates all space on that disk. I often start my partition with sector 1 instead of sector 0. IMO, creating a partition is generally a better than using the entire disk. It allows the OS to recognize that the disk is not a brand new disk. And forcing the partition to start with sector 1 wastes very little diskspace but keeps the real data out of the way of any disk tools that might want to write boot code or disk labels etc. One one of the servers, for each disk, create a partition :
# fdisk /dev/mapper/mpathX
new partition from sector 1 to the max sector
Of course we realize that this change affects all of the members of the multipath (sdf, sdp, sdae and sdat in the example above). Now, since you’re likely in a multi-node environment, let’s update the cached view each nodes has of it’s disk partition tables with the latest partition table (since we have may have updated the disks from a different node). To do so, on each node and for each mpath, run the following :
# partprobe /dev/mapper/mpathX
# kpartx -d /dev/mapper/mpathX
# kpartx -a /dev/mapper/mpathX
Aside : Listing WWID for disks (from Yannid)
for disk in `ls /dev/sd*`
do
disk_short=`basename $disk`
wwid=`scsi_id -g -s /block/$disk_short`
echo -e "Disk $disk \t has WWID: $wwid."
done
# Sample Output
# Disk sdac has WWID: 3600601602b702d007f2a73fbc648e111
# Note that each disk in a multipath group shares the same WWID
3. For better results, limit the disks that ASM sees
Assuming that you have already created the oracleasm config file (and created ASM devices etc) by running (the optional ‘-I’ allows more detailed asm config questions)
# /usr/sbin/oracleasm configure -I
You can now update the /etc/sysconfig/oracleasm file to restrict ASM from looking at plain SCSI disks and force it to scan only multipath devices when it starts. The dm should be redundant, but we include it. Similarly, we could exclude IDE devices (hd), CDs (sr) and tape drives (st) if you have many of those attached. /etc/sysconfig/oracleasm:
ORACLEASM_SCANORDER="mpath dm"
ORACLEASM_SCANEXCLUDE="sd"
Alternately(?) use the ‘ASM_DISKSTRING=..’ parameter with ‘*’ wildcards in the config file to specify the disks that should be scanned. Typically ASM_DISKSTRING should point to /dev/oracleasm if you will be using ASMlib and point to /dev/asm if not.
Now you can create your ASM disks. They must start with a letter. Convention says that names are all capitals and a number (leave room for growth). Some examples are DATA001 FRA_001 OCR1 There are many ways to manage ASM disks – ASMCMD provides and SQL-like interface for DBAs. OEM provides tools. The most sysadmin-friendly is ASMlib, which I recommend using. You can use the following ASMlib commands to manage disks:
# /etc/init.d/oracleasm createdisk VOL1 /dev/sdg1
# /etc/init.d/oracleasm listdisks
# /etc/init.d/oracleasm deletedisk VOL1
# /etc/init.d/oracleasm querydisk /dev/sdg1
Once you have created the disks on one node, you must discover them on the other nodes:
# /etc/init.d/oracleasm scandisks
Once you have created and discovered disks, you can add them to ASM disk groups. Or you can create the disks and disk groups at the same time with the sample SQL statement below. You can also manage disks through the /usr/sbin/oracleasm command.
FWIW, here is an ASMCMD (the alternative to #oracleasm createdisk) to create a diskgroup. I have not found a CLI way to do it through ASMlib. But you can invoke the BUI named ‘ASM Configuration Assistant (ASMCA)’
FAILGROUP is a group of disks (say a tray) that are likely to fail at the same time. This hints ASM where to write duplicate copies of data (ie. not inside the same failure group).
CREATE DISKGROUP data NORMAL REDUNDANCY
FAILGROUP controller1 DISK
'/devices/diska1' NAME diska1,
'/devices/diska2' NAME diska2,
'/devices/diska3' NAME diska3,
'/devices/diska4' NAME diska4
FAILGROUP controller2 DISK
'/devices/diskb1' NAME diskb1,
'/devices/diskb2' NAME diskb2,
'/devices/diskb3' NAME diskb3,
'/devices/diskb4' NAME diskb4
ATTRIBUTE 'au_size'='4M',
'compatible.asm' = '11.2',
'compatible.rdbms' = '11.2',
'compatible.advm' = '11.2';
Rebalancing
This moves files around to different ASM disks to ensure that each ASM disk is filled to about the same amount. It’s like defragmenting a filesystem. You can set the ASM_POWER_LIMIT parameter to restrict how much effort ASM puts into this. You can manually fire off a rebalance with a low priority (max is 1024).
ALTER DISKGROUP data2 REBALANCE POWER 5;
If you want some really interesting info on ASMlib, download the source from the ASMLib support page on oss.oracle.com It is very user-friendly – lots of documentation and shell scripts.