Quick post to announce QDDA version 2.2 has been published on Github and in the Outrun-Extras YUM repository.
Reminder: The Quick and Dirty Dedupe Analyzer is an Open Source Linux tool that scans disks or files block by block to find duplicate blocks and compression ratios, so that it can report – in detail – what the expected data reduction rate is on a storage array capable of these things. It can be downloaded as standalone executable (QDDA download), as RPM package via YUM or compiled from source (QDDA Install)
QDDA 2.2 adds:
- DellEMC VMAX and PowerMAX support (using the DEFLATE compression algorithm)
- bash-completion (by entering on the command line, RPM version only)
- Improved options for custom storage definitions
- Internal C++ code improvements (not relevant to the normal user)
Note to other storage vendors: If you’d like your array to be included in the tool, drop me a note with dedupe/compression algorithm details and I’ll see what is possible.
Background:
QDDA up to 2.0 used only LZ4 compression, as this offers compression estimates very close to what DellEMC XtremIO is using internally. LZ4 is also used in many other flash arrays and some software defined storage options. PowerMAX uses DEFLATE (the algorithm used in ZIP) which offers a better compression ratio but at the expense of much more CPU power required to compress the data. During my development I found this increased CPU load so severe (up to 20 times more intense) that QDDA was slowing almost to a halt when scanning using DEFLATE. So I changed the DEFLATE analysis so that at random 1 out of every 20 blocks is compressed (the compress interval) and when the scan is done we just multiply the sum of the compressed values by 20 (to be precise, by the ratio of scanned vs non-scanned blocks to compensate for variations in randomness). If the dataset is large enough then this turns out to be very accurate, and analyze throughput is now back to levels we were used to using LZ4.
Note that VMAX and PowerMAX don’t immediately compress all data but it is impossible to predict how much data is compressed, so QDDA just spits out the reduction rates for all data (as if all data was compressed and deduped immediately).
More info:
The Quick and Dirty Deduplication Analyzer
Time for a change: Migrating Exadata to a modern All-Flash array
Announcing QDDA 2.0
QDDA manual
Hope you find it useful – and I’d appreciate if you let me know your experiences!
This post first appeared on Dirty Cache by Bart Sjerps. Copyright © 2011 – 2019. All rights reserved. Not to be reproduced for commercial purposes without written permission.