One request I got back after my series on Oracle RAC stretched clusters is if I could summarize again why anybody would choose VPLEX for storage replication over other solutions. My attempt was to describe the principles of VPLEX in enough detail for techies to understand it. For non-geeks, I will try to explain it as brief as possible.
With VPLEX, you install Oracle (Stretched) RAC as if it was a local cluster. No messing around with ASM mirroring, figuring out how to configure voting disks, etc. Everything looks just like any other RAC implementation. No more, no less. It also means, that VPLEX does not magically enhance Oracle’s RAC features. It just allows RAC to work flawlessly across distance.
By the way, the EMC white paper Oracle Extended RAC with EMC® VPLEX™ Metro – Best Practices Planning was published a few days ago!
So what are the advantages, from a business perspective?
- I would say, the most important business reason to run stretched clusters on VPLEX is: reduced risk. VPLEX hides complexity of storage replication for the database (and therefore, for the storage or database administrators) and therefore reduces risk of data corruptions, problems with servers not failing over when you most need them, you reduce performance issues in both normal operating mode, failure state and when recovering from failures. Could you do this without VPLEX? Certainly, if your administrators don’t make configuration mistakes, if the clusterware does not have bugs or design flaws, if your storage system behaves as expected, and if you make all configuration changes (adding servers or storage, upgrades, etc) correctly. Theory and reality often don’t match.
- You also make it fairly easy to perform failover testing (potentially while the cluster keeps running, and ideally without even breaking the remote replication). This means you’re protected against disasters even during D/R testing. Caveat (but I must be honest): this is future functionality in VPLEX (and I will be glad to push our engineers to drive this functionality if I can give them feedback from customers who really need this).
- It is application independent. I focused my blog posts on Oracle RAC for two reasons. First, it is my job to deal with Oracle technology. Second, Oracle RAC is (as far as I am aware) the only database that can be deployed as truely active/active clusters (meaning all cluster nodes can access the same database data simultaneously). If there would be another database or application that behaved in an active/active manner (and there must be some around) they will probably benefit from VPLEX in the same way. But even active/passive cluster configurations can benefit from VPLEX. For example, Microsoft clusters (MSCS) can automatically fail-over based on the quorum disks defined in VPLEX – where normally they would need to be manually restarted in case of certain failures that cannot be resolved with the standard quorum disk or node majority mechanisms. In particular, VPLEX allows VMWare environments to benefit from the access-anywhere behavior of storage volumes. This was initially targeted more to cloud application mobility, and in a lesser extent with high-availability clustering. But VMware offers clustering features as well (VMWare HA and DRS, on top of VMotion). So VPLEX allows building a stretched VMWare cluster where failure of a complete datacenter results in all failed virtual machines to be automatically restarted on the surviving datacenter location. Without manual intervention and without risk of split-brain issues.
- It opens the possibility to create application level consistency between multiple environments with what we call “Consistency groups”. Application level consistency is a tricky topic and most of the people I talk to are not fully aware of the implications. So keep me to my promise that I will write another blog post on this sometime.
Disadvantages? Probably. Here are the objections I expect from my customers:
- Vendor lock-in. EMC is the only vendor who can do this. Big problem? I don’t think so. It’s not like any business application that once rolled out, you can never get rid of. If, for reasons I cannot imagine, you would not like EMC as a trusted vendor, you could get rid of VPLEX and continue without stretched clusters or implement it using another (IMO, inferior) technology.
- Price. Well – that’s a good one. Of course, VPLEX isn’t free – I praise the fact that I’m not in the sales department and therefore not aware too much of our pricing. I bet VPLEX is not completely free. But in relation to network bandwidth between the locations? Oracle database and apps licensing? And the potential performance benefit? If you do the math and include everything, I would be surprised if you cannot justify the investment.
By the way, what is the business value of risk reduction?
Update 19-10-2011: Found a good post on VPLEX from a collegue (Richard Anderson) here.
Great perspective.
Have you done any testing with virtualized RAC? Would you consider it a benefit that the cpu cycles consumed by mirroring data (reduced IO workload) are offloaded from the RAC nodes?
Hi Don,
I tested RAC on my old 32-bit ESX server. More for myself to get a bit of hands-on experience. My own small system is not representative for real performance tests. The tests with RAC and stretched RAC on VPLEX are done by my colleagues in EMC Engineering, and not by myself in person (but I was one of the guys that drove them to start developing this, because my customers, mostly running large, mission-critical enterprise databases, were asking for such a solution frequently).
On your question, yes, I absolutely believe offloading CPU load is an advantage. You could argue that the CPU overhead of doing storage mirroring is negligible (say, 2 to 5%). But bear in mind that, for people who have licensed Oracle by CPU, these are extremely expensive CPU cycles, that, IMHO, are better suited for application processing than for dumb storage replication.
Saving a few percent on CPU licenses may not sound like much but represented in license cost it can be significant.
And there are many more ways to improve CPU utilization, drive down infrastructure cost and making business applications run much more effective.
(That’s why I named my blog “Dirty Cache” by the way 😉