In the previous post, we discussed the current state of many existing Oracle environments. Now we need a method to accurately measure or define the workload characteristics of a given database platform.
In our customer engagements we frequently ask details about databases, for example, “How much CPU power does the database require”, “how large is your production database”, “How much bandwidth is needed”.
As an illustration, let’s say that the answer to the database size question is: 5 Terabytes.
What do we know at this point? We have a rough idea but still a lot of missing pieces of the puzzle.
Database information gathering
A database size of 5 Terabytes:
- Is this Engineering (powers of 2, TiB) or marketing (powers of 10, TB) Terabytes?
- Is this the size of the core database files (tablespaces, logs, etc) or does it also include archive logs, binaries, exports, dumps, maybe even backups? Or the size of the file systems or disk groups?
- Or is it the space allocated on file systems or ASM disk groups?
- Is this allocated capacity or actual space consumption?
- Compressed or non-compressed? What compression ratio? Do we even have a way to find out?
- Usable capacity or the raw disk capacity before applying redundancy (RAID)?
- Including snapshots, clones, standby databases?
- Is this what the administrator from our customer THINKS how big the database is? Or was there an accurate measurement? How was it measured? When?
- How often is the answer simply, “I don’t know, actually…” followed by a rough estimate?
With storage I/O (IOPS, bandwidth, response times) there are also challenges:
- Peak or average?
- Measured from the database (such as ‘physical reads’ or ‘physical read IO requests’) or measured from the OS (tps, reads per second? Blocks per second? Average, peak? Per disk, total?)
- Inflated by mirroring (redo logs, redundancy)?
- Measured over what time period (interval)?
- Measured before or after queuing (service time vs. wait time, OS vs application/database)?
But maybe the hardest part is defining a metric for CPU power (whatever that may indicate) and a way to measure and compare it.
The question to “how much CPU power is required” usually remains mostly unanswered. Best case the requirement is based on a number of CPUs, or maybe Megahertz, usually ignoring the huge speed differences between various CPU models. There is simply no well-defined standard.
So, in order to perform a good system sizing, we need some data about existing systems. Often, this is provided in some form of Excel sheet or other document, listing a bunch of hosts or databases, sometimes with capacity or workload stats or requirements. Each time we get this from customers the format is different.
Sometimes the Excel sheets simply contain errors, missing data, or ambiguous descriptions. Often it takes at least a while to understand what the documents actually mean – as they are all completely different.
A better way to get Oracle workload information is to ask for Oracle AWR (Automatic Workload Repository) reports. Oracle, by default, stores workload history in the database with an interval of 1 hour, and keeps it for 7 days (often, this retention is increased by DBAs). Very detailed reports can be created for any time between two snapshots in the database. It requires a license but a free alternative (Statspack) is also available.
At least these reports are consistent and – more or less – standardized, but they lack some vital data. Besides, our customers frequently find unintended, but creative ways to make the reports hard to work with or simply unusable. A summary of problems I have experienced:
- Creating a report with an interval of many hours (sometimes I even get a single AWR covering a time period of multiple weeks). It’s like having to find the root cause of a Monday morning traffic jam, while all data you have is the average number of cars per hour for an entire week. Mission impossible.
- Creating the report in text format. Yes, in contrast to html, such a report is now human- readable without a browser, but much harder for automated tools to process (especially if there are many). Besides, some values in those reports are replaced with meaningless ‘#####’ marks because they are too large to fit in some columns. Note that Statspack reports are in text format by design, hence have the same problem.
- Strange language/regional settings, making the dates displayed in some language that automated tools have troubles with, or causing confusion due to time zone offsets
- Artifacts from the SQL*Plus session such as errors or status messages in between the report
- Creating “RAC” reports instead of per-instance, making it near-impossible to figure out which RAC nodes drives the resource consumption, and requiring different parsing methods
- No information on data sizes (tablespaces, datafiles, disks, segments etc)
- No information on CPU type (only the number of cores/sockets/threads is listed)
- No other system information like file systems, disks, networking, memory, running processes, kernel settings
- No information on other running instances on the same host, unless the reports are generated for all instances, ideally for the same time period
Even when created without any problems, a single AWR report only provides information for a short period of time (usually 1 hour). Only if the report is created during peak workload, we can use it for resource sizing, and then only if the database would run on dedicated hardware, not being influenced by other databases or applications. The chance that the single 1-hour report covers the exact workload peak of a system is fairly small and therefore I consider it unreliable for sizing.
Workloads usually show a weekly pattern where peaks happen daily or weekly on the same day of the week. For this reason, I usually requested 7 days of workload data, but then found that this period was not large enough, as people tend to generate the AWRs during workdays – giving you 6 full and two half-day periods instead of 7 full days.
Also, if you want to compare or aggregate workload from multiple databases, the time period for both need to have at least a 7-day overlap. For this reason I currently recommend creating at least 10 days worth of workload stats – usually good enough to have a single 7-day period of workload data from multiple databases.
In order to produce a reliable set of AWR or Statspack workload reports, at Dell, we have used a collection of SQL scripts for a long time. A DBA runs the main script on each database instance, providing the desired number of hours for which to generate reports as a parameter. More recent versions also produce summaries of database sizes and some other details.
Although this works well for a single or a handful of databases, it starts to become problematic when we want to do sizings for much larger environments. Imagine a few hundred databases for which a DBA must run the script with the right parameter on each one, then somehow archive or upload the hundreds to thousands of generated reports manually (the largest workload dataset I have received from a customer so far is for roughly 400 databases, times 24 hours, times 10 days).
Also, the SQL scripts still do not provide OS information such as CPU type, file systems, etc. And we can only hope not a single important instance was missed by mistake.
Another issue is that AWR reports usually contain portions of SQL statements, of which some of my customers expressed their concern that this is a potential security problem – prohibiting them to share with 3rd party companies like Dell.
Dbcollect is a tool I started to develop since late 2019, as a continuation of the framework provided by these SQL scripts. As it is written in Python rather than just SQL, it allows for a more sophisticated way of gathering database host data:
- A Single “executable” that can be executed on a host, usually without providing parameters, 100% Open Source and transparent
- It picks up ALL running Oracle instances on any host
- It can remove ALL SQL code from AWR reports
- It cleans up temporary files and generates a single ZIP archive containing all reports and configuration data
- It picks up SAR/Sysstat information in addition to AWR/Statspack
- It runs a set of additional database queries, providing detail on storage sizes, configuration settings, compression ratios and other things
- It provides a lot of additional OS information such as CPU, file systems, disks, interfaces and more.
- Predicts estimated run time across multiple threads
- Logs any errors and aborts if the correct license to generate AWR reports is not detected
One disadvantage is that it now requires Python on Linux and does not work on Windows (yet). For those we still have the classic scripts.
Dbcollect can be used as an alternative to the Dell Statspack/AWR scripts, as it generates all the files needed to feed into Dell Live Optics and/or our Dell internal Splunk-based Oracle workload reporting engine.
But stay tuned for what else I am using this for. Currently dbcollect supports Linux (all RHEL 6 and later versions and variants including Oracle Linux), AIX, and Solaris (HP-UX should be straightforward but I haven’t encountered customers using HP-UX recently).
Dbcollect provides an easy, standardized, automated way for our customers to gather database workload data, in a format that is consistent and well-defined, such that we can reliably and easily process the data into workload reports and do proper system sizings.
As the code and the resulting ZIP files are completely transparent, this may be useful for other vendors, too (isn’t Open Source great :). Dell competitors: Feel free to reach out if you find it useful.
Now that we standardized the data collection procedure, we also need something to rate different processors in terms of how much workload they can handle. And there’s a lot of different metrics and benchmarks out there.
Of course we have MIPS (Million Instructions per Second) and FLOPS (Floating Point Operations per Second). Historically we have Dhrystone. Linux even provides a metric called BogoMIPS (millions of times per second a CPU can do absolutely nothing). There are a zillion more benchmarking tools available, some commercial, some freeware.
So why not use those? Well, first of all, databases are mostly datamovers (shifting blocks of data between memory and disk), and mostly integer calculation oriented, so FLOPS is not very relevant. “Million Instructions per second” completely ignores other performance factors such as pipelining, caches, etc. So we would need something that gives us a more relevant definition of how well a processor performs – ideally in the context of transaction processing rather than focused on CPU calculation.
In addition, we need some kind of metric supported by the majority of vendors, independent, repeatable and trusted.
Thinking relational databases, naturally a transactional benchmark like TPC-C would make sense. Historically I have used it as it is a transactional benchmark and therefore ideal in theory, but the practical problem is that not a lot of different systems and platforms are being published, especially recently.
SPEC is not suffering from this issue as nearly every CPU intended for servers is listed, usually many times, by different vendors, and used in different server configurations, which is why I – and many others in the industry – are using this now as next-best alternative.
As databases (with some exceptions) are usually sized on throughput rather than response time, and usually heavily multi-threaded/multi-processing, I believe the best SPEC benchmark to compare systems is SPECrate® 2017 Integer.
Although not directly a transactional benchmark, it has proven to be good enough to use for database sizings (with some caveats as we will discuss later). From SpecRate Integer, I am using the benchmark’s “Baseline” number, as it is more consistent and relevant than “Peak” numbers (as it is too heavily optimized and therefore less consistent). The SPECrate® 2017 Integer Baseline is a good (albeit not perfect) metric to indicate the overall CPU power of a server related to DBMS workloads.
However, as Oracle is frequently licensed by CPU core, and we look for ways to consolidate databases on systems with either more (scale-up) or less (scale-out) clusters, the question arises how we can compare systems with different core counts. So I simply divide the Baseline number by the number of physical cores. Remark that this is fundamentally different from SPECSpeed as this is particularly oriented to single-threaded processes.
As an acronym for this metric I came up with the term SPEC Int Rate Per Core (SPIRC) but that doesn’t pronounce very well, so (as a pun to Linux’ BogoMIPS), I’m spelling it backwards, resulting in CRIPS
SpecIntRateperCore spelled backwards
How do we compare this to the older SPEC2006 benchmark (now obsolete) as many older systems only have a SPEC2006 publication?
Based on some (internet) research, I’m using a conversion factor of 10 for the baselines (which is, again, far from perfect and SPEC actively discourages this – but in reality it works well enough and I know of no better way).
Real world example
How does this work in reality? Say we have a customer running a 3-node cluster with 5 year old Intel Xeon E5-2699 v4 processors (2 sockets per node). We want to see how much CPU cores we need if we would replace this with a modern system.
Step 1. Get the CRIPS rating
For 2 sockets Intel Xeon E5-2699 v4, the SPECrate2017 Baseline is rougly 176. There are slight variations between various publications – I usually take the average of the top highest entries. Dividing by the number of cores we get a CRIPS rating of 4.0 (176/44). As we have 3 nodes (132 cores total) our total CRIPS number for the cluster is 528 (same as 3x the baseline of 176).
Step 2. Compare a different, modern CPU
Currently the Dell server with 2 Intel sockets offering the highest CRIPS rating is using the Intel Xeon Gold 6250, with a Baseline of 155 (lower than the 2699 v4 but with far fewer cores). The CRIPS rating is 155/16 = 9.6875. AMD offers some CPU’s with even higher CRIPS ratings but this customer preferred Intel.
Step 3. Find required number of CPU cores for new system
How many CPU cores would we need to achieve at least the same total CPU power? 528/9.6875 = 54.5 cores. Using a 4-node cluster based on the Xeon 6250 (64 cores) we would have a total CRIPS of 620 (17% more power).
The total workload could run on 64 instead of 132 cores, allowing us to achieve significant future cost savings. So we have more servers (4 instead of 3) but less, more efficient CPUs.
Scale-up is also an option, with note that servers with higher core counts tend to have lower CRIPS ratings.
Example: Intel Xeon Gold 6346, baseline of 281, CRIPS rating of 8.78 ((281/32, less than the 6250 but still very good). With only 2 nodes we have a total CRIPS of 562, more than enough to achieve the required 528.
And we haven’t even looked at actual vs peak workload. I will save that for one of my future posts.
This post first appeared on Dirty Cache by Bart Sjerps. Copyright © 2011 – 2021. All rights reserved. Not to be reproduced for commercial purposes without written permission.
4,469 total views