Announcing qdda 2.0

It’s almost a year since I blogged about qdda (the Quick & Dirty Dedupe Analyzer).

qdda is a tool that lets you scan any Linux disk or file (or multiple disks) and predicts potential thin, dedupe and compression savings if you would move that disk/file to an All Flash array like DellEMC XtremIO or VMAX All-flash. In contrast to similar (usually vendor-based) tools, qdda can run completely independent. It does NOT require a registration or sending a binary dataset back to the mothership (which would be a security risk). Anyone can inspect the source code and run it so there are no hidden secrets.

It’s based upon the most widely deployed database engine, SQLite, and uses MD5 hashing and LZ4 compression to produce data reduction estimates.

The reason it took a while to follow-up is because I spent a lot of evening hours to almost completely rewrite the tool. A summary of changes:

  • Run completely as non-privileged user (i.e. ‘nobody’) to make it safe to run on production systems
  • Increased the hash to 60 bits so it scales to at least 80 Terabyte without compromising accuracy
  • Decreased the database space consumption by 50%
  • Multithreading so there are separate readers, workers and a single database updater which allows qdda to use multiple CPU cores
  • Many other huge performance improvements (qdda has demonstrated to scan data at about 7GB/s on a fast server, bottleneck was IO and theoretically could handle double that bandwidth before maxing out on database updates)
  • Very detailed embedded man page (manual). The qdda executable itself can show its own man page (on Linux with ‘man’ installed)
  • Improved standard reports and detailed reports with compression and dedupe histograms
  • Option to define your own custom array definitions
  • Removed system dependencies (SQLite, LZ4, and other libraries) to allow qdda to run at almost any Linux system and can be downloaded as a single executable (no more requirements to install RPM packages)
  • Many other small improvements and additions
  • Completely moved to github – where you can also download the binary

Read the overview and animated demo on the project homepage here: https://github.com/outrunnl/qdda

HTML version of the detailed manual page: https://github.com/outrunnl/qdda/blob/master/doc/qdda.md

As qdda is licensed under GPL it offers no guarantee on anything. My recommendation is to use it for learning purposes or do a first what-if analysis, and if you’re interested in data reduction numbers from the vendor, then ask them for a formal analysis using their own tools. That said, I did a few comparison tests and the data reduction numbers were within 1% of the results from vendor-supported tools. The manpage has a section on accuracy explaining the differences.

(more…)

Loading

Time for a change: Migrating Exadata to a modern All-Flash array

 

inflateWith Oracle’s uncertain SPARC future, the rise of very fast and capable All-Flash arrays and existing Exadata customers looking to refresh their hardware, I increasingly get questions on what platform we can offer as an alternative to Exadata or Supercluster. A big challenge can be to break away from the lock-in effect of HCC (Hybrid Columnar Compression although I’d like to call it Hotel California Compression) as it seems hard to get an estimate on how much storage capacity is needed in other storage or migrating to other platforms. Note that any storage could theoretically use HCC but Oracle disabled it on purpose on anything other than Exadata, ZFS appliance or Pillar (huh?) storage.

As far as I know, there is no direct query to figure out how big a HCC compressed table would get after decompressing. HCC tables can get very large and the compression ratio can be fairly high which makes sizing a new environment a challenge.

So in order to provide a reasonable guesstimate, I created a typical scenario on my VMware homelab to estimate generic storage requirements for HCC compressed environments.

(more…)

Loading

Six alternatives to replace licensed Oracle database options

More and more I hear companies looking into replacing expensive licensed database options with more cost-effective alternatives, for various reasons: simple cost reduction, or avoiding lock-in, or because of other strategic considerations. In order to kickstart the creative thinking, here are a few hardware and software based methods to achieve cost reduction in database options (on top of Oracle Enterprise Edition). I am assuming the base product stays the same (Oracle Enterprise Edition) as replacing EE with alternatives can be very cost effective – but requires serious changes in architecture and management so I’ll leave that for another time.

(more…)

Loading

Five reasons to choose VMware vs Oracle Multitenant

While busy preparing another blogpost on my dedupe analyzer tool, I was triggered to write a quick hotlist of reasons why one would strategically choose VMware virtualization over Oracle multitenant option if the goal is to achieve operational and cost benefits. There may be other good reasons to go for Oracle Multitenant but cost savings is not one of them.

On twitter I claimed I could give 5 reasons why, so here we go…

WARNING: Controversial topic – further reading at own risk 😉

(more…)

Loading

The Quick and Dirty Dedupe Analyzer – Part 1 – Hands on

As announced in my last blogpost, qdda is a tool that analyzes potential storage savings by scanning data and giving a deduplication, compression and thin provisioning estimate. The results are an indication whether a modern All-Flash Array (AFA) like Dell EMC XtremIO would be worth considering.

In this (lenghty) post I will go over the basics of qdda and run a few synthetic test scenarios to show what’s possible. The next posts will cover more advanced scenarios such as running against Oracle database data, multiple nodes and other exotic ones such as running against ZFS storage pools.

[ Warning: Lengthy technical content, Rated T, parental advisory required ]

(more…)

Loading