Dirty Cache

The best thing about being me… There are so many “me”s.

— Agent Smith, The Matrix Reloaded

One of our customers reported less than optimal space savings on XtremIO running Oracle. In order to test various scenarios with Oracle I was in search of a deduplication analysis method or tool – only to find out that there was nothing available that qualified.

TL;DR: QDDA is an Open Source tool I wrote to analyze Linux files, devices or data streams for duplicate blocks and compression estimates. It can quickly give you an idea of how much storage savings you could get using a modern All-Flash Array like XtremIO. It is safe to use on production systems and allows quick analysis of various test scenarios giving direct results, and even works with files/devices that are in use. No registration or uploading of your confidential data is required.

Methods / tools I have considered:

(Semi) commercially available data analyzers – they require you to run a test tool on a host that generate a binary file that you have to upload to the vendor’s site and wait for the report to be returned in the mail. The collection method is not clearly specified and iterations take a long time due to the upload-and-wait mechanism. I can’t hand such tools to customers either because they require a partner login id which they probably don’t have.
A real XtremIO box – which we had in the lab but it was shared with other test environments and currently the management tools only give you overall numbers, not for a particular set of volumes or files. Running a few quick scenarios becomes hard.
Setting up a dedupe-capable file system (such as BTRFS or – ouch – ZFS) and see what such filesystems report on various scenarios. May work but you probably only get end results and not directly how the FS got there. Will probably either break, run out of memory, or slowly grind to a halt at larger data sets.
A Linux shell script that searches for unique blocks in some way or another.. Slow and painful.
Freeware tools – I couldn’t find any, there are tools that smell similar but focus on other things such as backup compression estimators and the like.

Now when you think of what an All-Flash Array like XtremIO does is – at least in concept – pretty straightforward:

For every block that gets written, calculate some sort of hash
See if the hash already exists
If not, save the block to flash storage and the hash in metadata for later reference
If yes, increase the counter (and do some other complex pointer stuff not relevant for us)

So if you have a bunch of data blocks (a file or disk) and you would count similar hashes then you can figure out the deduplication rate. I thought if you have a simple key-value store you could use it to do dedupe analysis. In the key-value store the key would be the block hash (checksum) and the value represents the amount of times we find a similar hash.

In the conversations with my colleagues I said I could probably write a basic tool to do this in a day or so. I did lots of shell scripting in the last years but I considered that not to be fast enough, so I decided to polish up my ancient C++ programming skills (last time I did serious work in C++ was in 1994, finalizing my Avionics degree).

I started catching up on C++ and for speed and simplicity I decided to use SQLite for the key-value store – because it’s usually already available on all Linux systems and it is extremely fast – because it’s not a full database as you know it with listeners and subprocesses and multi-user support and all the bells and whistles, but merely a set of library functions such that your C++ program becomes the database engine itself. Within a day I had a simple C++ program that opens a file, reads it block by block and updates a SQLite database with a k-v table accordingly, plus a simple report at the end to show how many blocks processed vs how many hashes are stored in the database (the ratio is the dedup ratio).

Because it was quickly implemented, I called it QDDA – The Quick and Dirty Deduplication Analyzer.

Also because it allows you to run various dedupe scenarios extremely quickly – so the name still covers the game.

For hashing I use CRC32 (I started with SHA256 and other variants but CRC32 is fast and requires only 4 bytes per hash where others need much more. The tradeoff is accuracy as CRC32 has a relatively high chance of hash collisions – you could not accept that for storing data but one collision in 4294967296 blocks (32 bits) is perfectly acceptable for a data analysis tool.

For compression I started with gzip and scanning every 10th block (for performance) but it turns out LZ4 compression is much, much faster – at the expense of a slightly worse compression ratio. LZ4 is used where speed is preferred over high compression. LZ4 can scan at line rate (actually the hashing takes more compute time, roughly 4x in my tests).

It took me roughly a few weeks to improve the tool to where I am now and I have made it available as Open Source. It now also includes a thin provisioning and compression ratio analysis, as well as two different compression ratio calculations: one is the standard LZ4 block compression which gives you the “stream” compression ratio (such as when you compress the entire file during backups), the other one, which I call “bucket compression” is modeled after what XtremIO uses internally where compressed blocks under 2k go into 2k buckets, under 4k into 4k buckets and the rest in regular 8K blocks – this results in a bit lower overall compression ratio but avoids serious issues with fragmentation during writes to compressed blocks.

I also added a throttling mechanism (default limit 200MB/s) to prevent accidental runs to starve the I/O of a production system, and the option to work with Unix pipes instead of just files or block devices – this allows the tool to be running as non-root user where another (privileged, or remote) command feeds the data from disk and feeds it to the analyzer.

Lastly I modified the code to accept the blocksize as a parameter (default is 8K but you may experiment with, for example, 4K or 16K block sizes).

Here a report of a previous test run – on a VMware VM with 6 Oracle ASM devices holding a database and an RMAN backup set.

[bart@outrun01 ~]$ qdda -f /var/tmp/q.db 
qdda 1.4.2 - The Quick & Dirty Dedupe Analyzer
blocksize           =           8 KiB
total               =    27648.00 MiB (   3538944 blocks)
free                =    15875.69 MiB (   2032088 blocks)
used                =    11772.31 MiB (   1506856 blocks)
unique              =     6559.70 MiB (    839642 blocks)
deduped 2x          =     2205.86 MiB (    282350 blocks)
deduped 3x          =       22.11 MiB (      2830 blocks)
deduped 4x          =       36.87 MiB (      4719 blocks)
deduped >4x         =       96.55 MiB (     12359 blocks)
deduped total       =     8921.09 MiB (   1141900 blocks)
stream compressed   =     3559.90 MiB (     60.10 %)
compress buckets 2k =      799.27 MiB (    409227 buckets)
compress buckets 4k =      799.02 MiB (    204550 buckets)
compress buckets 8k =     4125.96 MiB (    528123 buckets)
total compressed    =     5724.25 MiB (    732704 blocks)
                      *** Summary ***
percentage used     =           42.58 %
percentage free     =           57.42 %
deduplication ratio =            1.32
compression ratio   =            1.56
thin ratio          =            2.35
combined            =            4.83
raw capacity        =    27648.00 MiB
net capacity        =     5724.25 MiB

Let the numbers sink in for a moment, but here the explanation: we scanned 27GB of data of which roughly 16GB holds real data, the rest (11.7G) is unallocated (zeroed).

6.5GB of all scanned data is unique and cannot be duplicated (Oracle hardly ever writes the same 8K block twice). 2.2 GB appears twice (this is the effect of an RMAN backup-as-copy to the same disk pool). Blocks appearing 3x or more are rare in this test run and I guess their origin is not Oracle data blocks but other stuff (ASM metadata maybe or possibly archive logs that are partly equal to redo logs)

After deduplication the required capacity drops from 11.7GB to about 9GB. Backing up the entire disks would require 3.5GB but that’s assuming we compress as a stream. Actually the real LZ4 compress tool would report even higher but it will also report on zero blocks and can compress against previous “seen” data so our numbers are a bit conservative. GZIP or BZIP would get you even higher compression but at a high performance penalty.

If we compress into buckets (XtremIO method) we still get as low as 5.7GB (1.56:1 compression ratio). The thin ratio (not used vs used blocks) is pretty high but gives you an idea how much a set of disks is underprovisioned.

Performance (Intel i5-4440):

bart@workstation ~ $ qdda -p
Deleting /var/tmp/qdda.db
Test set:             131072 random 8k blocks (1024 MiB)
Hashing:              959097 microseconds,    1119.53 MB/s
Compressing:         1001426 microseconds,    1072.21 MB/s
DB insert:            270975 microseconds,    3962.51 MB/s
Reading:              112896 microseconds,    9510.89 MB/s
Total:               2344394 microseconds,     458.00 MB/s

Expect to get roughly 500MB/s o a regular PC hardware. I expect much more on fast server hardware. Note that you need to disable throttling to get more than 200MB – to prevent users accidentally shooting themselves in the foot with IO.

I have set up a Wiki manual page (shortlink: http://outrun.nl/wiki/qdda) with more info and download instructions. Over the coming period, if time allows, I intend to use the tool to blog on various dedupe scenarios with Oracle.

Happy analyzing!

The Quick and Dirty Deduplication Analyzer

The Quick and Dirty Deduplication Analyzer

Like this:

2 thoughts on “The Quick and Dirty Deduplication Analyzer”

Leave a ReplyCancel reply

Share this:

Like this:

2 thoughts on “The Quick and Dirty Deduplication Analyzer”

Leave a ReplyCancel reply