It’s almost a year since I blogged about qdda (the Quick & Dirty Dedupe Analyzer).
qdda is a tool that lets you scan any Linux disk or file (or multiple disks) and predicts potential thin, dedupe and compression savings if you would move that disk/file to an All Flash array like DellEMC XtremIO or VMAX All-flash. In contrast to similar (usually vendor-based) tools, qdda can run completely independent. It does NOT require a registration or sending a binary dataset back to the mothership (which would be a security risk). Anyone can inspect the source code and run it so there are no hidden secrets.
It’s based upon the most widely deployed database engine, SQLite, and uses MD5 hashing and LZ4 compression to produce data reduction estimates.
The reason it took a while to follow-up is because I spent a lot of evening hours to almost completely rewrite the tool. A summary of changes:
- Run completely as non-privileged user (i.e. ‘nobody’) to make it safe to run on production systems
- Increased the hash to 60 bits so it scales to at least 80 Terabyte without compromising accuracy
- Decreased the database space consumption by 50%
- Multithreading so there are separate readers, workers and a single database updater which allows qdda to use multiple CPU cores
- Many other huge performance improvements (qdda has demonstrated to scan data at about 7GB/s on a fast server, bottleneck was IO and theoretically could handle double that bandwidth before maxing out on database updates)
- Very detailed embedded man page (manual). The qdda executable itself can show its own man page (on Linux with ‘man’ installed)
- Improved standard reports and detailed reports with compression and dedupe histograms
- Option to define your own custom array definitions
- Removed system dependencies (SQLite, LZ4, and other libraries) to allow qdda to run at almost any Linux system and can be downloaded as a single executable (no more requirements to install RPM packages)
- Many other small improvements and additions
- Completely moved to github – where you can also download the binary
Read the overview and animated demo on the project homepage here: https://github.com/outrunnl/qdda
HTML version of the detailed manual page: https://github.com/outrunnl/qdda/blob/master/doc/qdda.md
As qdda is licensed under GPL it offers no guarantee on anything. My recommendation is to use it for learning purposes or do a first what-if analysis, and if you’re interested in data reduction numbers from the vendor, then ask them for a formal analysis using their own tools. That said, I did a few comparison tests and the data reduction numbers were within 1% of the results from vendor-supported tools. The manpage has a section on accuracy explaining the differences.