This is an article I wrote a while ago (late 2009), a while after EMC introduced Enterprise Flash Drives (EFD’s). Although more tooling is available these days to automate the tiering of storage, the basic concepts are still very valid, and the article might be a good explanation of the basic concept of database storage tiering and what we want to achieve with this strategy.
I recommend you read Flash Drives first to get some background knowledge before continuing with ILM.
Innovation with Flash Drives
The innovation in disk drive technology with Enterprise Flash Drives (EFD’s – also known as Solid State Disk or SSD’s) is capable of solving the problem of low random performance when using mechanical disk drives.
Probably the most interesting area to use this new technology is RDBMS’es (Relational DataBase Management Systems), such as Oracle, IBM DB2/UDB, PostgreSQL or MS-SQL Server.
In this article I will focus on Oracle but much of the technology is also available for other databases.
In research on worldwide information growth, the focus is often on unstructured or semi-structured data (such as office documents, scanned papers, pictures, movie clips, and other multimedia files). Indeed they cause the enormous growth in information storage. But in the meantime, databases (with structured data) also keep growing. My own experience is that existing production databases might grow typically between 15% and 100% per year depending on whether it’s just autonomic growth or growth caused by adding new business functionality. Also, we have had frequent discussions about compliancy regulations for several years now, and today we see some of these actually being implemented – maybe partly due to the problems in the financial world, for which longer retention of transactional records is required.
Or, for example, the medical world where more and more information is stored in digital form as opposed to analog (or it was simply not being stored at all) and have to be digitally available for many years.
Higher business availability requirements of databases have caused more customers to consider and implement replication for their applications, in order to achieve disaster recovery and better protection against logical or human errors. The typical mission critical Oracle database I see at customers have (snapshot) copies for backup, replicated mirrors for disaster recovery, one or more acceptance replicas, maybe a test- or “firefighting” environment, and often even more than one development replicas. As most of these databases are copies of production that have to be refreshed every now and then, the result of one gigabyte growth in native database capacity results in a multiple of this in the total infrastructure. If you include backup capacity, the effect of adding this one gigabyte to a database can result in the additional cost of ten or more gigabytes total required capacity (storage and backup).
Larger databases are harder to manage, typically slower (there’s more I/O to be performed before getting to the required data records), more expensive (the performance impact needs to be compensated by adding CPU power, memory, I/O bandwidth, maybe extra cluster nodes, etc).
Now if I ask the typical Oracle DBA or database architect how much of the database capacity is wasted by data that is almost never needed, the answers vary from, say, 50 to 95 per cent. And most legacy – or even new – database applications (especially production) have typically one service level for all data: “tier-1” (high redundancy, high performance, with high energy consumption and cooling requirements, short backup windows, quick incident response times, long backup retention, etc).
Is there a way to reduce the size of these databases or use a tiered approach, so that only the most critical and performance hungry data sits on tier-1, while the rest is moved to lower-SLA storage, processors, backup, etc?
If we can do this in some way, then the effect on total efficiency (cost, manageability, performance, service levels) can be very large.
Relational database complexity
But RDBMS databases are hard to deal with in this respect. In comparison, a file system, email system or content management application has unique, distinct objects (i.e. files, emails, email attachments, documents etc) that can be moved to another location, or deleted without much problems (as long as the metadata is updated about the new location of objects).
An RDBMS database is not so easy. The “R” in RDBMS means Relational – which already suggests that the records in the database have relations with other database data. Often these relations are on the application level and not directly visible from within the database. For example, if I would delete a customer record, then invoices that point to that customer will miss information – probably resulting in application errors and other problems.
This complexity has caused IT organizations to ignore databases for long and started working on the “low hanging fruit” of Information Lifecycle Management: Files and email first, and at a close second place, content management systems.
But now that databases have become very large (terabyte databases are no exceptions any more and I have seen some customers building over 60 Terabyte data warehouses already), the problems with a single tier approach become prominent. As discussed before, the “spinning rust” disk drives have kept up with Moore’s law in terms of capacity and channel speeds, but lack improved random I/O performance. If you read a 2 Terabyte SATA drive using only random seeks – 8 milliseconds average for every 8K block (which is a century compared to microprocessor speeds), it would take you 22 days to read all data on the drive – which shows that such large drives are not usable for high performance databases. The faster Fibre Channel drives are currently available in 600 GB form factors, but suffer from the same basic problem. So even if you build a very large database (say 10 Terabytes or more) you could use only 32 drives (assuming disk mirroring) but this is not offering the I/O performance we need.
What typically happens next is that administrators either use “short-stroking” (they will only use half or less of the drive’s capacity, typically the disks “outer edge” as that offers the best performance) or buy smaller, faster, more energy hungry drives (i.e. 146 GB 15,000 rpm instead of 600 GB 10,000 rpm) so they get larger numbers of “spindles” and still have a reasonable storage utilization (albeit artificial because of the high cost per gigabyte).
A side effect of this is that because drives are larger than needed for the given capacity, the direct need to remove data from the database seems to disappear (“we have enough unused drive capacity anyway, why not use it?”) but this ignores the problems of database management (ever had to rebuild an index on a terabyte database table?) and backup (how long does it take to restore a 10 terabyte database from tape, roll-forward the archive logs, recover the database and finally re-start application servers to get in business again? What is the cost associated with having to backup this database twice a day in a 3 hour backup window and keeping backups in the tape library for 60 days?)
Oracle has come with some new technology that deals partly with these problems. To start with, they have offered database partitioning for years to allow us to carve up large tables in chunks that can be managed individually. For example, splitting the terabyte table in 10 parts of 100GB each would allow us to rebuild indexes one at a time which means 10 offline slots during 10 weekdays, instead of one large maintenance slot during a weekend. If carefully planned, the partitions can be divided by region, time period (last month vs. older), different customer types or mainly any selection criteria that is somehow already in the database schema. This sometimes allows data warehouse queries to run faster (only having to run on one partition instead of the whole table) but it also introduces new complexity challenges (thinking of what criteria to use to separate partitions, index design, performance aspects, etc).
One advantage is that (according to Oracle) partitioning is completely application transparent, so that no rewrites are needed on the application level.
Partitioning combined with storage tiering, where two or more tiers are defined having different performance and price levels, allows one large Oracle table to be spread out over a portion of fast disk (Flash or fast FC) and efficient (slower but cheaper and lower energy requiring) disk.
This is a relatively easy way to bring down the infrastructure cost of a large database without having to change any application or database function. All data is still available in the same database only partitioned and with different performance levels.
EMC supports storage tiering with its recently introduced “FAST” (Fully Automated Storage Tiering) functionality. The current version can move storage volumes between different RAID levels, disk types, storage groups etc. completely without any application downtime or performance impact. The future version will allow much smaller pieces of disk (disk chunks of less than 1 Megabyte) to be moved dynamically and real-time based on application policies (defined by business service levels). The end result is that a tiered storage approach to databases will be much more flexible, automatic and online – where currently tiering depends on a static design that requires manual changes and possibly downtime to change policies or move data around.
Tiering does not reduce the total size of the database (or copies or backups) – the logical view of the database size is still the same.