In the previous post, I announced Maximum Efficiency Architecture – a methodology for achieving optimal cost-efficiency for (Oracle) databases whilst also maintaining (or even improving) business service levels. In this post we will review the current state of typical database landscapes.

From our conversations with many customers as well as looking at their performance data, we come to the following findings and conclusions.

Most customers still on bare metal

I often ask customers the following question: “What is the current level of virtualization and implementation of public, private or hybrid cloud?”

A very common answer:

“Oh, we virtualized nearly everything!… eh, well, except Oracle…”

— Many customers

So, a large majority of all Oracle databases is still deployed on bare metal systems without any kind of virtualization, with some databases on high-end UNIX systems with LPAR (logical partitions) or containers (also named Zones or Workload Partitions).
Adoption of Oracle on true virtualization (VMware, KVM, OracleVM/XEN and others) is increasing – but still not very common.

Bare metal systems have poor CPU efficiency

Bare metal servers running Oracle databases usually have an average CPU utilization somewhere between 5% and 15% – based on numerous performance reports (mostly through Oracle AWR or Statspack) we received over the years.

We can identify several root causes and reasons for this:

  • Running bare metal servers has a serious flaw: there is no good way to dynamically move one workload (usually a database instance) to another host without downtime. This means that the entire server needs to be sized for peak workload, such that business performance SLAs are always met for all instances combined on the machine. If the resource consumption of all processes on one machine exceeds the available processing resources, performance problems start to emerge.
  • As database workloads are known to be extremely dynamic – varying from idle during weekends to 100% load during peak hours, the net result is low average utilization even if occasionally it may hit very high levels. Remember – you pay for infrastructure all the time, not only when it’s being used.
  • No good database application sizing methods exist, often resulting in heavily oversized servers.
  • Application architects responsible for infrastructure sizing are usually not responsible for infrastructure and license cost – so they tend to drastically oversize just to make sure they will never be blamed for performance issues due to lack of processing power.
  • Administrators tend to focus on mission critical production machines which usually run at somewhat higher CPU load, and ignore the rest of the database landscape like test and development, staging, data warehouse/analytics, training, acceptance systems, disaster recovery.
  • Some systems run non-database transaction tasks such as management agents, backups, replication, ETL, and sometimes even middleware and/or application servers on the same host. This artificially increases reported CPU utilization, but these CPU cycles are not assigned to database transaction processing.

We will go in more detail about these causes in the following blogposts.

Tactical versus strategic use of private cloud technology

Even when organizations have adopted a virtualization and private cloud strategy for running database workloads, very often this is only done to allow quicker deployment of new systems and some reduction in hardware cost. It’s a no-brainer that deploying a new virtual machine with a database on an existing private cloud is much faster than deploying a new bare metal server and getting it ready for a production database workload.

Still, many customers that are at this level have not fully unlocked the true potential of virtualization as a foundation for achieving much more cost efficiency, as we will explore further during this series of blog posts.

Aging systems and Moore’s law

Moore’s law has given us faster CPU cores as well as more CPU cores per socket/server, and very often, this positive effect is not considered when buying new hardware. We frequently encounter servers that are only running a single or a few database instances, which require only a few CPU cores, yet run on servers with dual-socket, high core count machines. Servers with 48 cores running only one or two databases are not uncommon.

In other cases, aging systems are simply being replaced “like-for-like” such that the total number of new processors equals the amount of the older systems – without looking deeper in the performance differences.

UNIX versus x86

Although most Oracle workloads these days run on 64-bit x86 commodity hardware, some customers keep their databases on high-end UNIX systems. HP-UX based Itanium servers are no longer supported, Intel Xeon based HP-Superdome running Linux have become a small niche and hardly relevant. SUN SPARC is still more common but may share the same fate.

IBM pSeries (running POWER processors and AIX as Operating System) are still quite common.

Lack of real-world application sizing tools

Unlike for example with SAP ERP, where the standard for performance sizing is SAPS (see SAP Benchmarks), in Oracle there is no well-established way to calculate application performance for sizing purposes. When deploying new systems, the required performance is therefore a bit of fuzzy logic, based on previous experiences with similar applications, extrapolated to number of users, database size, etc. I wrote about this before – in 2013! – Getting the most out of your server resources

However, even with the same business application and the same number of end users, performance requirements may vary greatly due to the way an organization uses the application, the amount of custom code, interfaces, and other factors.

Lock-in

Some customers have limited flexibility in deploying virtualization options that would give them better workload management capabilities. Usually this is due to platform choices (pre-engineered database appliances usually have a limited number of choices for hypervisors, containers, storage platforms and operating systems).

This can be a conscious choice due to a tradeoff between flexibility and other features (for example: performance, ease-of-use, time to deploy, certain unique features) but in many cases, this decision was based on other factors (vendor preference, perceptions, price pressure).

Contracts and license agreements

Many customers have enterprise license agreements (ELAs/ULAs) in place for software, allowing them to install and run the software on as many systems they like. This often leads to the perception that it doesn’t matter how many are deployed because there’s no financial charge for additional systems.

At the very least however, more system deployments lead to more hardware and infrastructure cost.
Also, in many cases, such license agreements are more akin to a delayed payment agreement because at some point in time, the contracts may need to be renewed with conditions based on the actual deployment. Planning a few years ahead can make a lot of difference.

Summary

Looking at the current state of many Oracle deployments, there is a huge potential for efficiency optimizations. To measure and define current and future states, we need to agree on how we define (and measure) things like performance, efficiency, capacity and more.

In the next post, we will discuss methods and metrics for defining capacity and performance metrics.

This post first appeared on Dirty Cache by Bart Sjerps. Copyright © 2011 – 2021. All rights reserved. Not to be reproduced for commercial purposes without written permission.

Loading

Maximum Efficiency Architecture: Current State
Tagged on:                         

Leave a Reply