With Oracle’s uncertain SPARC future, the rise of very fast and capable All-Flash arrays and existing Exadata customers looking to refresh their hardware, I increasingly get questions on what platform we can offer as an alternative to Exadata or Supercluster. A big challenge can be to break away from the lock-in effect of HCC (Hybrid Columnar Compression although I’d like to call it Hotel California Compression) as it seems hard to get an estimate on how much storage capacity is needed in other storage or migrating to other platforms. Note that any storage could theoretically use HCC but Oracle disabled it on purpose on anything other than Exadata, ZFS appliance or Pillar (huh?) storage.
As far as I know, there is no direct query to figure out how big a HCC compressed table would get after decompressing. HCC tables can get very large and the compression ratio can be fairly high which makes sizing a new environment a challenge.
So in order to provide a reasonable guesstimate, I created a typical scenario on my VMware homelab to estimate generic storage requirements for HCC compressed environments.
Test description
I wanted to evaluate the scenario where a customer has a large amount of HCC compressed data, and wants to find the HCC compression ratio in order to calculate the storage requirements when moving back to non-HCC compressed storage. Note that although the new storage array may not support HCC, it may use other compression methods.
The idea is that we create a test table on a separate tablespace (and datafile) and copy data from HCC tables into the test table, then measure the size of the newly created datafile to find out the uncompressed size. As the new tablespace may become huge, we can sample a percentage of the HCC table instead to calculate our estimate. We then run an Open Source dedupe and compression analysis tool (QDDA – which I created for purposes like this) to scan the data for compressibility on an All-Flash array like DellEMC XtremIO or DellEMC VMAX All-Flash although the method is not limited to those platforms – it would work for any array that can perform inline compression.
I created a fairly large dataset on standard ASM diskgroups using Swingbench to simulate more or less realistic data, then loaded the largest table into a HCC compressed table which forms the basis of further testing, so we delete the real swingbench schema as it was only used to generate data. We pretend as if we only have a database with HCC data.
Lab setup
- A VM with 8 cores and 8 GB memory (db01.lan)
- Oracle DB (11.2.0.4.0, patched to latest patch update) on Linux using ASM. On purpose not using Oracle 12 as some customers may run older versions and I wanted to make sure the method works on older versions as well)
- ASM on both Flash and spinning disk (VMware VMDK)
- ZFS Storage Appliance Simulator for VMware version 2013.06.05.7.0, release date 2017-3-22 (zfssa.lan)
- Swingbench 2.5.0.971
As I don’t have a real Exadata at home (or other real Oracle storage such as a ZFS storage appliance or Pillar) I used ZFSSA simulator to run HCC. In order to do this, the ZFSSA needs to have SNMP enabled which really was the only thing I had to set up to make it work. There are good blogposts such as Installing Oracle ZFS Storage Appliance simulator for you virtual storage requirements that describe how Oracle queries the storage using SNMP to see if it is “true” Oracle ZFS storage. Also note that there is a bug in some versions that cause ORA-64307 when performing some operations on a ZFS based file system – so patch the database to the latest version if needed.
In my case, after enabling SNMP, this is how my simulator responds to the following command:
# snmpget -v1 -c public zfssa.lan 1.3.6.1.4.1.42.2.225.1.4.2.0 iso.3.6.1.4.1.42.2.225.1.4.2.0 = STRING: "Sun Storage 7000"
As described in other blogposts, Oracle enables HCC when the query string responds “Sun Storage 7xxx” which is the case here so we can use it for simulation tests.
Preparation
On the ZFSSA simulator, I created a default ZFS pool named zpool and a share named default/oradata which is then exported as NFS. Further settings for oradata (mostly according to best practices to get reasonable performance):
- update access time on read disabled
- database record size 8K
- restrict ownership change disabled
- compression disabled
For comparisons I also created a local XFS filesystem where we can store the non-HCC enabled tables so we can easily measure tablespace sizes. So I have (in addition to ASM diskgroups):
[oracle@db01 zfs](DB01) $ df | grep oradata /dev/mapper/data-odata 8378368 32928 8345440 1% /oradata/xfs zfssa:/export/oradata 35899840 32 35899808 1% /oradata/zfs
In order to get HCC working on the ZFSSA we need to enable DirectNFS – there are various articles and blogposts that tell you how to do this.
My /etc/fstab entry:
zfssa:/export/oradata /oradata/zfs nfs bg,hard,rsize=32768,wsize=32768,vers=3,nointr,timeo=600,tcp,noatime 0 0
/etc/oranfstab:
server: zfssa path: 192.168.10.31 export: /export/oradata mount: /oradata/zfs
For the test we create a number of tablespaces: SOE for Swingbench data, HCC for HCC data, FLAT for uncompressed copies of HCC data, and OLTP for OLTP compressed copies of HCC data. For the tablespaces we want to evaluate we set the size and increments to 1M so we can get reasonable accurate file sizes that don’t contain too much empty space.
Creating the tablespaces for Swingbench and HCC first:
SYS:DB01 > create tablespace SOE datafile '+FLASH' size 100M autoextend on next 100M maxsize unlimited; SYS:DB01 > create tablespace HCC datafile '/oradata/zfs/hcc.dbf' size 1M autoextend on next 1M maxsize unlimited default compress for query high;
For the test we configure a Swingbench SOE (Sales Order Entry benchmark) dataset on ASM using scale=16, which results in about 25GB of data in the SOE tablespace. This is the combined size of tables and indexes.
SYS:DB01 > @ls-datafiles NAME MB BLKSZ FILENAME ---------------- ------------ ---------- ------------------------------------------------------------ SYSTEM 270.0 8192 +DATA/db01/datafile/system.263.969266913 SYSAUX 380.0 8192 +DATA/db01/datafile/sysaux.264.969267017 UNDOTBS1 795.0 8192 +DATA/db01/datafile/undotbs1.265.969267105 SOE 25,800.0 8192 +FLASH/db01/datafile/soe.256.969272821 HCC 1.0 8192 /oradata/zfs/hcc.dbf TEMP 10,240.0 8192 +DATA/db01/temp01.dbf CONTROLFILE 16.8 16384 +DATA/db01/control1.ctl CONTROLFILE 16.8 16384 +DATA/db01/control2.ctl REDO GROUP 1 32.0 512 +REDO/db01/redo01.log REDO GROUP 2 32.0 512 +REDO/db01/redo02.log REDO GROUP 3 32.0 512 +REDO/db01/redo03.log ------------ Total 37,615.6
We want to figure out how much capacity would be required for HCC vs non-HCC, where we can distinguish between not compressed at all (this is how the Swingbench data was created) and OLTP compression. The Swingbench data is only there because I wanted to have some reasonably realistic data in the database.
Listing the table sizes for the SOE user:
SYS:DB01 > @object-sizes OWNER SEGMENT TYPE SIZE_MB TABLESPACE_NAME -------------------- ------------------------- -------------------- ------------------------------ SOE ORDER_ITEMS TABLE 4,476.0 SOE SOE ORDERS TABLE 2,570.0 SOE SOE ADDRESSES TABLE 2,396.0 SOE SOE CUSTOMERS TABLE 2,192.0 SOE SOE CARD_DETAILS TABLE 1,577.0 SOE SOE ORDER_ITEMS_PK INDEX 1,439.0 SOE SOE LOGON TABLE 1,348.0 SOE SOE ITEM_ORDER_IX INDEX 1,288.1 SOE SOE ITEM_PRODUCT_IX INDEX 1,141.6 SOE SOE CUST_EMAIL_IX INDEX 629.1 SOE SOE ORD_ORDER_DATE_IX INDEX 484.8 SOE SOE ORD_WAREHOUSE_IX INDEX 456.4 SOE SOE CARDDETAILS_CUST_IX INDEX 452.6 SOE SOE ADDRESS_CUST_IX INDEX 452.5 SOE SOE CUST_FUNC_LOWER_NAME_IX INDEX 439.7 SOE SOE ORD_CUSTOMER_IX INDEX 431.3 SOE SOE ADDRESS_PK INDEX 426.5 SOE SOE CARD_DETAILS_PK INDEX 426.3 SOE SOE ORDER_PK INDEX 406.1 SOE SOE ORD_SALES_REP_IX INDEX 382.3 SOE SOE CUST_DOB_IX INDEX 339.7 SOE SOE CUSTOMERS_PK INDEX 284.5 SOE SOE CUST_ACCOUNT_MANAGER_IX INDEX 268.2 SOE SOE INVENTORIES TABLE 176.0 SOE SOE INVENTORY_PK INDEX 18.0 SOE SOE INV_PRODUCT_IX INDEX 16.7 SOE SOE INV_WAREHOUSE_IX INDEX 16.7 SOE --------- Total MiB 24,535.0 27 rows selected.
Note that the object-sizes script only lists objects of at least 1MB in size.
As ORDER_ITEMS is the largest table, let’s use that one to simulate HCC compression. We copy the data in a HCC compressed table and then copy it back to non-compressed and OLTP-compressed tablespaces.
Let’s make an empty HCC table and fill it with data from order_items:
SYS:DB01 > create table soe.order_items_hcc tablespace HCC as select * from soe.order_items where 0=1; SYS:DB01 > alter table soe.order_items_hcc compress for query high;
Verify if it is HCC enabled:
SYS:DB01 > select tablespace_name,table_name,compression,compress_for from dba_tables where compression != 'DISABLED'; Tablespace Table Compress COMPRESS_FOR ------------ ------------------------------ --------- ------------ HCC ORDER_ITEMS_HCC ENABLED QUERY HIGH
Now let’s load data into the HCC compressed table by copying from order_items, using the ‘append’ hint so we get direct path load (which is what’s needed to get good compression on HCC):
insert /*+append */ into soe.order_items_hcc select * from soe.order_items;
Note that we could do more optimization like pre-sorting the data to get higher compression ratios but I want to keep it simple for now.
Verify row count:
SYS:DB01 > select count(*) from soe.order_items union all select count(*) from soe.order_items_hcc; COUNT(*) ---------- 68631927 68631927
So we can delete the original Swingbench data now:
SYS:DB01 > drop tablespace soe including contents and datafiles;
At this point we have a HCC tablespace that is roughly 830MB and we prentend to forget how big the original size was – as this is what most customers experience after running on HCC for a long time. Given only a HCC table, how do we get the compression ratio?
Estimating HCC compression ratio
Simply said, we can do this by loading the data in non-compressed tables (I use 2 tables here, one without compression and the other with OLTP compression)
SYS:DB01 > create tablespace FLAT datafile '/oradata/xfs/flat.dbf' size 1M autoextend on next 1M maxsize unlimited; SYS:DB01 > create tablespace OLTP datafile '/oradata/xfs/oltp.dbf' size 1M autoextend on next 1M maxsize unlimited; SYS:DB01 > create table soe.order_items_flat tablespace FLAT as select * from soe.order_items_hcc where 0=1; SYS:DB01 > create table soe.order_items_oltp tablespace OLTP as select * from soe.order_items_hcc where 0=1; -- And we need to set the compression types before loading data: SYS:DB01 > alter table soe.order_items_oltp compress for oltp;
Verify:
SYS:DB01 > select tablespace_name,table_name,compression,compress_for from dba_tables where compression != 'DISABLED'; Tablespace Table Compress COMPRESS_FOR ------------ ------------------------------ --------- ------------ OLTP ORDER_ITEMS_OLTP ENABLED OLTP HCC ORDER_ITEMS_HCC ENABLED QUERY HIGH
We copy the data back to non-HCC compressed tables:
SYS:DB01 > insert /*+append */ into soe.order_items_flat select * from soe.order_items_hcc; SYS:DB01 > insert /*+append */ into soe.order_items_oltp select * from soe.order_items_hcc;
A file list of the datafiles shows:
-rw-r----- 1 oracle oinstall 4765786112 Feb 28 14:36 /oradata/xfs/flat.dbf -rw-r----- 1 oracle oinstall 3356499968 Feb 28 14:36 /oradata/xfs/oltp.dbf -rw-r-----+ 1 oracle oinstall 828383232 Feb 28 14:14 /oradata/zfs/hcc.dbf
Verify with the total size versus the used space:
SYS:DB01 > @ls-datafile-free Tablespace Name File Name Size (M) Used (M) Free (M) % Used -------------------- --------------------- ---------- ---------- ---------- ---------- FLAT /oradata/xfs/flat.dbf 4545 4287 256 94 HCC /oradata/zfs/hcc.dbf 790 752 37 95 OLTP /oradata/xfs/oltp.dbf 3201 3008 192 94 SYS:DB01 > @ts-sizes TABLESPACE Owner TYPE OBJECTS SIZE_MB -------------------- -------------------- ------------ ---------- ------------ FLAT SOE TABLE 1 4,287.1 HCC SOE TABLE 1 752.0 OLTP SOE TABLE 1 3,008.0 ------------ Total MiB 8,047.1
So we can now verify the compression ratios as non-compressed is the consumed size of the FLAT tablespace: 4287 MiB.
So HCC compression ratio = 4287 / 752 = 1 : 5.7 (or 82.5%, not bad)
OLTP compression = 4287 / 3008 = 1 : 1.425 (or 29.9%).
Sampling HCC data
In this case we uncompressed the entire HCC table. These are often very large even compressed and uncompressing them may require huge amounts of diskspace that may not be available. So we only want to make a random sample of the HCC data. Note that this sample tablespace can be placed on any file system, even on a local disk somewhere as it’s only a dummy copy of the data.
Let’s create another flat file tablespace:
SQL> create tablespace SAMPLE datafile '/oradata/xfs/sample.dbf' size 1M autoextend on next 1M maxsize unlimited; SQL> create table soe.order_items_sample tablespace SAMPLE as select * from soe.order_items_hcc where 0=1;
Let’s say we want to sample 5% of all rows. 5% is 5/100 so we want to select num_rows * (5/100). In order to avoid selecting only the first 5% of the rows we will use a randomizer. Needless to say that you can pick any percentage as long as it’s enough to be statistically accurate.
SQL query to select a random 5% of the rows in another table:
INSERT /*+append */ INTO soe.order_items_sample SELECT * from soe.order_items_hcc WHERE rownum < (select floor(count(*)*5/100) FROM soe.order_items_hcc) ORDER BY dbms_random.value();
Verify rowcount and show new datafile sizes:
SQL> select count(*) from soe.order_items_hcc union all select count(*) from soe.order_items_sample; COUNT(*) ---------- 68631927 3431595 @ls-datafile-free Tablespace File Name Size (M) Used (M) Free (M) % Used ----------- ------------------------------------------- -------- ---------- ---------- ---------- FLAT /oradata/xfs/flat.dbf 4545 4287 256 94 HCC /oradata/zfs/hcc.dbf 790 752 37 95 OLTP /oradata/xfs/oltp.dbf 3201 3008 192 94 SAMPLE /oradata/xfs/sample.dbf 227 216 10 95 SYSAUX +DATA/db01/datafile/sysaux.264.969267017 390 361 28 93 SYSTEM +DATA/db01/datafile/system.263.969266913 270 264 4 98 UNDOTBS1 +DATA/db01/datafile/undotbs1.265.969267105 795 310 11 39 7 rows selected.
Here our uncompressed data is 216MB. We need to multiply by 20 (divide by 5%) to get the estimated uncompressed size of the entire table so 216 * 20 = 4320 which is slightly more than our full decompressed table (4287).
So our compressed ratio based on the sample is 4320 / 752 = 1 : 5.7 which equals our calculation based on the full decompress.
Storage requirements
Now that we found the HCC compression ratio, the next question is how much capacity is required after moving away from HCC capable storage completely. In order to answer this question we need to consider more than just the HCC compression ratio as not all data can be compressed with HCC – so for the total database size we need to include other factors as well.
First thing to be aware of is that modern All-Flash arrays like DellEMC XtremIO or VMAX can compress data inline on the storage level. There are some pros and cons to this approach; the most important benefit is that storage doesn’t know much about database binary formats and datafile types so it attempts to compress ALL data (usually with good results). This includes redo logs, temp tables, indexes, rollback, archive logs etc. The disadvantage is that storage (compression) is not capable of storing database data in columnar format so you can’t achieve equally high compression ratios – but that may not be too much of a problem as we will see.
Can we predict what our compression ratio would be on such a compression-enabled array? For this I wrote the QDDA tool so let’s use it here on our flat file datafiles to get an idea.
As a side note, QDDA uses the Open Source LZ4 algorithm – which is – depending on the storage architecture – not exactly what some real world arrays will use, but the difference is very small.
Also note that QDDA scans data block by block (default 16K as XtremIO X2 stores data in 16K blocks) and compresses each block separately – which gets you a lower compression ratio than if you would just compress the entire file in one go (like zip or gzip would do) – this reflects how storage systems compress data in such a way that random performance can be guaranteed.
The dedupe ratio on a single Oracle tablespace is usually 1 (no dedupe) as we will see.
[oracle@db01 ~](DB01) $ qdda /oradata/xfs/flat.dbf qdda 1.9.2 - The Quick & Dirty Dedupe Analyzer Use for educational purposes only - actual array reduction results may vary File 01, 290881 16k blocks (4545 MiB) processed, 107/200 MB/s, 63 MB/s avg Merging 290881 blocks (4545 MiB) with 0 blocks (0 MiB) Indexing in 1.54 sec (189372 blocks/s, 2958 MiB/s), Joining in 1.61 sec (180688 blocks/s, 2823 MiB/s) *** Overview *** total = 4545.02 MiB ( 290881 blocks) used = 4545.02 MiB ( 290881 blocks) deduped = 4545.02 MiB ( 290881 blocks) allocated = 2155.86 MiB ( 137975 blocks) *** Details *** Compression method = XtremIO X2 blocksize = 16 KiB free (zero) = 0.00 MiB ( 0 blocks) compress pre dedup = 2011.47 MiB ( 55.74 %) merged by dedupe = 0.00 MiB ( 0 blocks) compress post dedup = 2011.47 MiB ( 55.74 %) unique data = 4545.02 MiB ( 290881 blocks) duped 2:1 = 0.00 MiB ( 0 blocks) duped >2:1 = 0.00 MiB ( 0 blocks) duped total = 0.00 MiB ( 0 blocks) *** Summary *** percentage used = 100.00 % percentage free = 0.00 % deduplication ratio = 1.00 compression ratio = 2.11 thin ratio = 1.00 combined = 2.11 raw capacity = 4545.02 MiB net capacity = 2155.86 MiB
We can see that the storage compression ratio is estimated to be 1 : 2.11 for the flat (decompressed) table.
Let’s do the same for the OLTP compressed table:
[oracle@db01 ~](DB01) $ qdda /oradata/xfs/oltp.dbf qdda 1.9.2 - The Quick & Dirty Dedupe Analyzer Use for educational purposes only - actual array reduction results may vary File 01, 204865 16k blocks (3201 MiB) processed, 107/200 MB/s, 43 MB/s avg Merging 204865 blocks (3201 MiB) with 0 blocks (0 MiB) Indexing in 0.91 sec (225425 blocks/s, 3522 MiB/s), Joining in 1.10 sec (185972 blocks/s, 2905 MiB/s) *** Overview *** total = 3201.02 MiB ( 204865 blocks) used = 3201.02 MiB ( 204865 blocks) deduped = 3201.02 MiB ( 204865 blocks) allocated = 2310.84 MiB ( 147894 blocks) *** Details *** Compression method = XtremIO X2 blocksize = 16 KiB free (zero) = 0.00 MiB ( 0 blocks) compress pre dedup = 2224.31 MiB ( 30.51 %) merged by dedupe = 0.00 MiB ( 0 blocks) compress post dedup = 2224.31 MiB ( 30.51 %) unique data = 3201.02 MiB ( 204865 blocks) duped 2:1 = 0.00 MiB ( 0 blocks) duped >2:1 = 0.00 MiB ( 0 blocks) duped total = 0.00 MiB ( 0 blocks) *** Summary *** percentage used = 100.00 % percentage free = 0.00 % deduplication ratio = 1.00 compression ratio = 1.39 thin ratio = 1.00 combined = 1.39 raw capacity = 3201.02 MiB net capacity = 2310.84 MiB
The ratio is 1.39 which is less than for flat (as expected) but we can still compress OLTP compressed data further. But the net (required) capacity is MORE (2310) than if we hadn’t compressed at all (2155) so the overall storage efficiency is better when NOT using OLTP compression at all.
Let’s run QDDA also on our sample to see if we get similar ratios:
[oracle@db01 ~](DB01) $ qdda /oradata/xfs/sample.dbf qdda 1.9.2 - The Quick & Dirty Dedupe Analyzer Use for educational purposes only - actual array reduction results may vary File 01, 14529 16k blocks (227 MiB) processed, 63/200 MB/s, 69 MB/s avg Merging 14529 blocks (227 MiB) with 0 blocks (0 MiB) Indexing in 0.04 sec (344093 blocks/s, 5376 MiB/s), Joining in 0.07 sec (195474 blocks/s, 3054 MiB/s) *** Overview *** total = 227.02 MiB ( 14529 blocks) used = 227.02 MiB ( 14529 blocks) deduped = 227.02 MiB ( 14529 blocks) allocated = 120.12 MiB ( 7688 blocks) *** Details *** Compression method = XtremIO X2 blocksize = 16 KiB free (zero) = 0.00 MiB ( 0 blocks) compress pre dedup = 107.97 MiB ( 52.44 %) merged by dedupe = 0.00 MiB ( 0 blocks) compress post dedup = 107.97 MiB ( 52.44 %) unique data = 227.02 MiB ( 14529 blocks) duped 2:1 = 0.00 MiB ( 0 blocks) duped >2:1 = 0.00 MiB ( 0 blocks) duped total = 0.00 MiB ( 0 blocks) *** Summary *** percentage used = 100.00 % percentage free = 0.00 % deduplication ratio = 1.00 compression ratio = 1.89 thin ratio = 1.00 combined = 1.89 raw capacity = 227.02 MiB net capacity = 120.12 MiB
The ratio is a little less than when running on the full ‘flat’ table: 1.89. Not sure why at this point but my expectation is because of the random selection of rows which makes data less compressible. More on that later.
Another effect when moving away from Exadata HCC is the raw capacity required for each GB of data. On Exadata your data is either mirrored or triple-mirrored (normal or high redundancy). So if the entire database is, say, 10TB in size then it requires 20TB or even 30TB of diskspace (ignoring other Exadata storage overhead for a moment such as reserved space). On a good AFA the RAID overhead would be something like 12.5% or less instead of 100% or 200% such as on Exadata.
Let’s say for sake of the example that the size of HCC is equal to the size of the other tablespaces. This because (at least in my experience) HCC data is usually not larger than the non-HCC data in the database, simply because indexes on HCC compressed tables are not HCC compressed and you need space for TEMP, REDO and OLTP-style tables too, as well as archive logs and maybe backup files. Let’s be conservative and assume 1:2 compression for our new array, and 12.5% RAID/protection overhead. But you can run with other numbers for your own environment of course.
Sizes in GiB HCC ratio 1: 6 All Flash raw/usable ratio 1: 1.125 All Flash comp ratio 1: 2 Exadata Other Redo logs 100 100 (same) Temp tables 500 500 (same) Non-HCC tablespaces 5000 5000 (same, includes index for HCC tables) HCC tablespaces 5000 0 (only on Exadata) Decompressed HCC 0 30000 (size x 6) Database size 10600 35600 (sum of database files) Storage size HIGH 31800 (due to ASM high redundancy) Storage size NORMAL 21200 (due to ASM normal redundancy) Storage size All Flash 20025 (after storage compression and parity overhead)
So a 10.6 TB database requires 32TB Exadata capacity but only 20TB on a DellEMC All-flash. Even if the Exadata was using normal redundancy the storage requirements would be roughly similar even when we cannot use HCC.
So in many cases we can fit a HCC compressed databases on All-Flash storage without too much trouble. There are edge cases of course – where customers have normal redundancy and huge archival tables on HCC with a much higher compression ratio because pre-sorted direct path loading, and maybe no indexes at all – so we may need a bit more capacity.
Quick method to calc projected capacity
Take existing total DB size (datafiles, redo, temp, undo)
Multiply by redundancy (2 or 3) – this should be Exadata raw storage required
Take size of HCC tables and multiply by HCC ratio minus 1, add this to total DB capacity
Multiply by parity factor (i.e. 1.125 or similar), divide by expected storage array compression ratio (usually between 2 and 2.5 for XtremIO, YMMV)
Back on the question is why we get slightly lower compress ratio on the sampled table compared to the full decompressed table.
QDDA has a histogram report function, so let’s run that on the flat.dbf datafile. (after running qdda /oradata/xfs/flat.dbf to load the data):
[oracle@db01 ~](DB01) $ qdda -x qdda 1.9.2 - The Quick & Dirty Dedupe Analyzer Use for educational purposes only - actual array reduction results may vary File Blksz Blocks MiB Filename 1 16 290881 4545 /oradata/xfs/flat.dbf Dedupe histogram: Dupcount Blocks Bytes 1 290881 4765794304 Compression Histogram (XtremIO X2): Bucket(KiB) Buckets Blocks Bytes 1 17052 1065 17461248 2 0 0 0 3 0 0 0 4 6 1 24576 5 5 1 25600 6 0 0 0 7 2 0 14336 8 273816 136908 2243100672 9 0 0 0 10 0 0 0 11 0 0 0 12 0 0 0 13 0 0 0 15 0 0 0 16 0 0 0
Here we see the distribution of compressed blocks (for XtremIO X2 in this case – note that the missing 14K bucket is not a bug). The 16K blocks that get compressed into less than 1K are usually empty Oracle blocks in the tablespace – 17052 16K chunks (“buckets”) that require 1065 16K blocks after compression. The vast majority of all Oracle blocks get compressed into an 8K bucket (which is exactly 1:2).
Running the same for sample.dbf (remember to scan the correct file first):
[oracle@db01 ~](DB01) $ qdda -x qdda 1.9.2 - The Quick & Dirty Dedupe Analyzer Use for educational purposes only - actual array reduction results may vary File Blksz Blocks MiB Filename 1 16 14529 227 /oradata/xfs/sample.dbf Dedupe histogram: Dupcount Blocks Bytes 1 14529 238043136 Compression Histogram (XtremIO X2): Bucket(KiB) Buckets Blocks Bytes 1 822 51 841728 2 1 0 2048 3 0 0 0 4 0 0 0 5 8 2 40960 6 1 0 6144 7 1 0 7168 8 1089 544 8921088 9 12607 7091 116186112 10 0 0 0 11 0 0 0 12 0 0 0 13 0 0 0 15 0 0 0 16 0 0 0
Here you see a significant number of 16K disk blocks get compressed into 9K buckets instead of 8K which explains the lower compression ratio. Another cause is that less empty table blocks exist in the tablespace (these get compressed to 1K).
To verify if this is indeed caused by randomizing our data let’s do a full decompress again but also randomize the data:
SYS:DB01 > create tablespace RANDOM datafile '/oradata/xfs/random.dbf' size 1M autoextend on next 1M maxsize unlimited; SYS:DB01 > create table soe.order_items_random tablespace random as select * from soe.order_items_hcc where 0=1; insert /*+append */ into soe.order_items_random select * from soe.order_items_hcc order by dbms_random.value();
Running a compress histogram on the random.dbf tablespace (after scanning it of course) we get this:
[oracle@db01 ~](DB01) $ qdda -x qdda 1.9.2 - The Quick & Dirty Dedupe Analyzer Use for educational purposes only - actual array reduction results may vary File Blksz Blocks MiB Filename 1 16 290881 4545 /oradata/xfs/random.dbf Dedupe histogram: Dupcount Blocks Bytes 1 290881 4765794304 Compression Histogram (XtremIO X2): Bucket(KiB) Buckets Blocks Bytes 1 17051 1065 17460224 2 0 0 0 3 0 0 0 4 0 0 0 5 11 3 56320 6 0 0 0 7 1 0 7168 8 117 58 958464 9 273701 153956 2522428416 10 0 0 0 11 0 0 0 12 0 0 0 13 0 0 0 15 0 0 0 16 0 0 0
So indeed the diff in compression ratio is caused by randomizing the table rows. This effect may or may not happen depending on how you would load HCC data during the final conversion. You could sort the data by a certain column which may get a better storage compression ratio. Experiment a bit to see what’s possible.
Conclusion
Estimating real HCC compression ratios is not a trivial task but with a bit of creativity it can be done. QDDA is a tool that can further help estimating required storage capacity when migrating away from HCC compressed data. Although HCC in itself can offer very high compression ratios due to the nature of columnar storage (which can also be found in some other database platforms) the storage savings are negated for a big part due to Exadata normal or high redundancy. Much more sophisticated data protection methods as well as the capability to compress ALL data at the storage level makes the business case for migration something worthy of investigation.
This post first appeared on Dirty Cache by Bart Sjerps. Copyright © 2011 – 2018. All rights reserved. Not to be reproduced for commercial purposes without written permission.
Comments are closed.