Dirty Cache

I have written before about managing database performance issues, and the topic is hot and alive as ever. Even with today’s fast processors, huge memory sizes and enormous bandwidth to storage and networks.

warning: Rated TG (Technical Guidance required) for sales guys and managers 😉

A few recent conversations with customers showed other examples of miscommunication between IT teams, resulting in problems not being solved efficiently and quickly.
In this case, the problem was around Oracle REDO log sync times and some customers had a whole bunch of questions to me on what EMC’s best practices are, how they enhance or replace Oracle’s best practices, and in general how they should configure REDO logs in the first place to get best performance. The whole challenge is complicated by the fact that more and more organizations are using EMC’s FAST-VP for automated tiering and performance balancing of their applications and some of the questions were around how FAST-VP improves (or messes up) REDO log performance.

So here is a list of guidelines and insights on REDO logs in general and the same on FAST-VP in particular – but first a tricky statement (which may lead me in serious problems – but heck, I’m Dutch, stubborn and I do it anyway).

Statement: On EMC storage, you should always have redo writes around 1 millisecond or less.

There. I said it. Any exceptions? Yes, a few, the most important one being: if you are using synchronous replication you have to add the round-trip latency to that 1ms. But then still you should see write response times below, let’s say, 3 ms or so.

The other notable exception is if you are really hammering the system with (large) writes (i.e. a data warehouse load or other sort of bulk load action).

Compare that to the days (hopefully long gone) of JBOD, where a write – without write cache – nearly always required a disk seek and therefore easily averaged way above 10 milliseconds.

Why do I say this? Because on EMC you always should write to storage cache. The resulting disk I/O is a background process and should not influence the write at all. Well, I heared some of my customers respond, what happens if your cache cannot flush writes fast enough to disk? The answer is, that happens but it should not happen in normal circumstances. If it does, something is wrong and should be fixed.

Strangely enough when I made this statement in presentations for Database Admins (DBA‘s), they looked pleasantly surprised in all occasions, some have thanked me for that, and it seemed as if someone finally listened to their prayers… Maybe at EMC we are normally too cautious to make such comments?

So, what kind of stuff challenges our targeted millisecond?

Queuing

Let’s say you create one large filesystem and put all your database files in there. Redo logs, data files, indexes, temp tables, rollback and all of the other stuff. The filesystem is using a set of disk volumes (LUNs) but without any separation for the different data types. Now assume your database is hammered with mixed workloads. A reporting user starts a heavy query – resulting in a full table scan – and the query puts a bunch of large-IO read requests on the disk queues. Let’s say a certain volume in the filesystem has 10 outstanding large read requests (say, 128K each) on the queue, totaling over 1 MB. Now another user is entering customer information on a web form in the application and submits the request (“save”). The save results in a commit for the updated small piece of data. Now we said a REDO write should be around 1ms or less, a REDO log sync (result of a commit) generates a set of these writes so the total redo sync time to be expected is maybe 5 ms. But the redo writes are queued on top of the outstanding 10 large read I/O’s. So before the REDO writes are processed, they have to wait for the ~ 1MB (or much more) reads to be completed. So the redo log sync time in the database might be reported as 50 milliseconds. The storage guys report “all quiet on the western front” as in their view of the world the REDO writes were serviced all around 1ms or less.

Solution: Create dedicated volumes (LUNs at the host levels) at least for redo logs. This gives dedicated queues for the redo I/O and data I/O will not interfere. The I/O’s will now be queued at HBA level so if that’s a concern, use dedicated HBA’s (but only if needed as this adds cost and complexity). I was told that on the HBA and storage the IOs are not processed just sequentially, but instead, the FA processes will pick the right IOs from the queue at will, so the queuing as described is less of an issue.

Mixed workloads

I said before that if the disks cannot keep up with redo writes then something is wrong. When does this happen? REDO I/O is normally mostly-write (near 100% writes) and sequential. So if you create a RAID group (either RAID level) that holds only the REDO logs then the normal IOPS sizing for the disk type is no longer valid. For example, a 10,000 rpm drive can do about 150 random IOPS. Note the word random! IF you do sequential writes then the number of IOPS can be much, much higher. A RAID-5 3+1 set has 4 spindles and can normally handle either 4 x 150 random read IOPS or about less than 2 x 150 random write IOPS (as every write has to be written twice, plus the overhead on disks for parity calculations). Now if you only have sequential write workload on the disk, there are hardly any seeks and the disk is limited by pure throughput. A 10K rpm disk can do maybe 50MByte/s which is theoretically about 6000 8K writes per second. No sweat. As in RAID-1 both drives have to do the same, a RAID-1 disk set can also do the 6000 8K IOPS. A RAID-5 3D+1P set hammered with pure sequential writes will – at least with EMC – keep all data for a RAID stripe in cache until it can calculate the parity in memory, and then write out the whole stripe at once to all disks. So the 3+1 group can do 3 x 6000 8K IOPS – the “4th” spindle will handle the parity (note that actual workload will probably be much lower with 8K writes due to other bottlenecks, the numbers here are pure for illustration and learning).

Now consider the same RAID group where all data is shared on the disk. Indexes, data, TEMP, REDO, etc. A bunch of REDO log flushes come in and write megabytes of data to the storage cache, to be flushed later. The disk starts writing the first pieces of data – only to be interrupted by a read request for an INDEX table – followed by a few more for data rows in the table – before the rest of REDO data can be flushed. But because of the read, the physical disk had to move the heads and now to complete the REDO flush, another seek is required to reposition the disk heads. But before finishing the REDO flush, a random write comes in for TEMP followed by a bunch of random TEMP reads. REDO has to wait and the disks are moving the heads again. The mix of data types on physical disk level therefore messes up the nice spinning-rust friendly workload and the amount of “dirty” redo blocks in storage cache are starting to add up. At a certain point some high water mark is reached (“Write Pending limits” in Symmetrix) and REDO sync times start to suffer. But even if this does not happen, the redo writes also generate disk seeks that interfere with random read requests. It seems as if REDO writes are doing fine but under the covers they mess up random read performance for data files.

Solution: Create dedicated disk groups for REDO logs (note that not all of my colleagues might agree and I bet the last word hasn’t been spoken on this discussion 😉

Remote replication

If you use synchronous replication (most of my customers have SRDF for this) then the bottleneck for writes will quickly move to the replication link. In SRDF, a LUN (volume) can accept (depending on microcode level and other parameters) 4 concurrent writes and no more. If the database throws 10 small (sequential) writes against a REDO volume, then this volume can accept and service only 4 (not sure if EMC Symmetrix Engineering increased the concurrency in recent microcode versions so I might not be accurate here – again, read my numbers as an example and verify the real numbers with your EMC technical support guys). The other 6 are kept on the queue until the remote storage system has responded that it has accepted the writes in good state. So even if the storage cache could accept hundreds of writes without waiting for disk flush, the limitation is now the logical SRDF link. The problem can be very subtle, as I have seen customers increasing the number of REDO volumes in an attempt to increase concurrency without results. This can happen because in Oracle there is always one redo log (group) active and within the redo log all writes might still go to one small part of a volume – even if there are many volumes configured. Note that performance analysis tools are notoriously unreliable here because within their interval of, say, 10 seconds, the hotspot might have moved hundreds of times and the workload averages out to acceptable levels, making you think the problem is elsewhere.

Solution: Switch to Asynchronous replication (my favourite). Good enough for 98% of all applications (with the notable exception of financial transaction processing where a second of data represents large amounts of Dirty Cash 😉
Alternative solution: use striping or other means to drive more parallelism, use priority controls, try to get less large IOs vs many small, etc.

Shared resources

Bit of a no-brainer but if you share logical or physical resources (such as host-bus adapters, front-end ports, SAN, etc etc) with other workloads (other servers, other databases, etc) then the REDO writes might end up higher in the queue before getting served. The idea is similar to what I said above about mixing data/index/log on single devices but this relates to complete databases or applications. For best redo log performance, make sure no other processes can interfere.

Solution: Isolate competing resources and give dedicated resources (logical or – if needed – physical) where required. Make sure you understand the performance tools and don’t let them fool you.

Over-aggressive striping

What happens if you stripe in storage (striped metavolumes, RAID-10, etc), then stripe on the volume manager (Linux/Unix LVM striped logical volumes or striped “md” multi-devices) then on top of that you use ASM fine striping or file system striping? The sequential write will be chopped in pieces and offered to storage as a bunch of seemingly unrelated small writes, causing many random seeks. The storage system will have no algorithm to detect and optimize for sequential streams. EMC recommends to stripe at most at two levels (not including the implicit RAID-5 striping). My personal view is to move away from striping completely, especially if you use ASM, FAST-VP (on Symmetrix) or both.

Solution: No – or limited – striping, and maybe use larger stripe sizes.

IO Chopping™

A term I invented myself related to any breaking of large I/Os into smaller pieces. I have seen that both the Linux OCFS2 and EXT3 file system, as well as the Linux I/O multipath feature, can break large I/O into smaller pieces (tip: EMC’s PowerPath does not do this). A single 1MB write could be carved up in 256 x 4K pieces. Needless to say this causes huge and unnecessary overhead. Not sure if these were Linux bugs or features (meaning they work as designed). Just something to verify in case of suspicion. Also be aware that wrong disk alignment can cause similar problems for some (not all) of the writes.

Solution: Ditch any layer that chops IO and use an alternative that doesn’t (personal experience: replace EXT3 with ASM, replace the Linux IO balancer with PowerPath)

RAID disk failures/Rebuilds

Especially in RAID-5, if a disk in the group is broken, then the rebuild or invoking of hot-spare might cause serious overhead. You cannot really avoid this unless by moving to RAID-1 or RAID-6. As said in earlier posts, EMC’s hot spare and disk scrubbing technologies attempt to isolate the failed disk and invoke the hot spare before the disk fails, thereby avoiding most of the overhead. If this is still a concern, use RAID-1.

Solution: Use EMC instead of cheapie-cheapie gigabytes 😉
Additional solution: Use RAID-1 if you are concerned about this.

Excessive workloads

Well, every system has a breaking point. If your system can handle 100 but you give it 150, it will slow down. Even if you configured everything by the book.

Solution: Optimize your applications (and users 😉
Alternative solution: buy faster iron (think EMC Flash, VFCache, etc)

A few notes on FAST-VP (Symmetrix)

If you use (or plan to use) FAST-VP on Symmetrix (VMAX) then you need to be aware of how this works. Without going into too much detail on the FAST-VP algorithms, it works by measuring performance statistics on chunks of data either 768K or 7.5 MB in size. If you do not separate data types at the host level, then both Oracle ASM, Unix/Linux filesystems and additional features (striping, etc) will potentially store REDO log data as well as other data in the same 768K or 7.5 MB chunks. Now if one of these chunks is hammered by redo writes as well as other random/sequential, large/small block, read/write workloads then the FAST-VP algorithms will have a very hard time figuring out what workload profile this chunk of data has. It will probably move the whole chunk to flash drives if there is heavy data I/O and it takes some of the REDO blocks along – resulting in the wrong type of data clogging the expensive flash drives.

My recommendation would be to always separate LUNs on the host level for at least REDO, DATA, ARCH and – depending on how much you want to optimize – also for DATA vs INDEX, separate TEMP, separate rollback/UNDO etc. The FAST-VP algorithms are best in breed and EMC has spent a lot of R&D to make it work, but a little help up front from the database engineers at our customers will not hurt 😉

Another recommendation is to increase the default ASM AU (Allocation Unit) size from 1MB to at least 8MB (preferably 16MB or even higher). This forces the database to put similar data and hot spots together, allowing FAST-VP to make even better decisions about what chops of data to move around and where.

I also got the question whether to create separate FAST-VP pools for these different data types within the same database. Honestly I cannot tell, I bet it depends again on how much effort you are willing to spend and how much additional benefit you will get from it. YMMV 😉
That said, if many more customers struggle with this, I will pick it up with engineering and see if we can create some guidelines on this. My intuitive answer is that FAST-VP is designed to make life of admins as easy as possible (note the A in FAST stands for “Automated”) – which means not too many knobs should need tweaking.

I cannot think of other problems in the I/O stack but there will probably be more. If you use EMC, have followed all my guidelines and still see high redo writes, let me know and I will try to help out (or throw the problem over the wall, to be picked up by one of my colleagues)…

Managing REDO log performance

Bart Sjerps 2012-05-20 Oracle, Performance 14 Comments

14 thoughts on “Managing REDO log performance”

Efstathios Efstathiou says:

2012-09-04 at 23:44

Hi,

as said IO-Chopping (I like that term) occurs, when writing to a ext3/4 or ocfs2 filesystem.

Writes get broken up in ext3 block size (4k).

While this is not an issue if you have filesystemio_option=none, meaning writes go to the filesystem cache, it can cause a severe performance drop, if you are issuing direct i/o against ext3 (filesystemio_options=setall or directIO).

It can easily been tracked by using iostat -x and check avgrq-sz column. When writing to a ext3 filesystem using direct i/o, avgrq-sz will be 8.0*512bytes=4k.

In contrast when doing then same with ASM i/o request are merged to 512*512bytes=256k.

We have done some tests today with direct i/o against ext3 and ASM using a EMC Vmax FC LUN.

– Ext3 write speed at 4k i/o size was 25-30 MB/s, which is very frustating, if you have to create a 1TB of tablespaces …
– ASM write speed at 256 i/o size was 200 MB/s (DBA likes… reminds me the old days on Solaris with Veritas and ODM/Quick I/O).

So speaking of chopping / slicing, we can say that when using direct i/o, ASM gives you 700g porterhouse steak of I/O, while you get a 30g beef carpaccio with ext3 on some arrays 😉

So, when using ext3, you may be better off to go add some more RAM and let i/o pass trough fs-cache, even tough this has some disadvantages like double buffering and alot of dirty memory pages, that need to be flushed all the time.*

As always I thankful for any corrections on mistakes / errors.

Regards

Efstathios

*(hint: use nmon for linux to monitor vm_dirty_pages in memory tab => press m, great tool).

Reply
Noons says:

2012-11-02 at 05:47

Great post! Really useful stuff and good to know I’m not the only one advocating these things.

On the issue of dedicating a LUN to the redos – and potentially wasting the disk space as redos hardly ever fill up even the smallest LUNs – what I do is merge into theLUN tablespaces that are low-I/O in Oracle. That is: SYSTEM, and maybe (not in all cases) SYSAUX and perhaps a tablespace of DIMENSION tables for DW.

They are usually mostly read-oriented and not very big (hitting the combined db and SAN cache really well) so they won’t greatly contend with the redo writing. This way one can get a “dedicated” LUN that is mostly full and still at top speed for redos.

Of course: there are ALWAYS exceptions, so this is NOT a rule of thumb to use everywhere! But for example in our DW – where changes to definitions of tables (SYSTEM tablespace) are not that common – it works famously!

Reply
1. Bart Sjerps says:
  
  2012-11-06 at 15:13
  
  Thanks for your comment, and good to hear you agree 🙂
  
  As said, strangely enough I seem to be one of the few in EMC who is outspoken in these matters (some others tend to be a bit conservative because saying the wrong words might backfire)
  
  I guess combining redo with other stuff is no problem, as long as you don’t plan to use EMC (or other storage vendors) snapshot/cloning/remote replication technology. Because in that case the LUN that holds REDO also holds SYSTEM and that makes snap based database restores “challenging” 😉
  
  Just something to be aware of.
  
  Reply
  1. Noons says:
    
    2012-11-06 at 23:00
    
    Indeed! Thanks for the heads-up. To minimize that problem I put a copy of the control file in the same LUN as the redos: that way the latest info needed to recover the db is always together.
    But I agree entirely: replication opens up a lot of cans that need to be addressed. The most important one being what should be in a consistency group. I have my own purgatory with that in a few other dbs…
    
    For our DW I replicate asynchronously only the FRA: it has all I need to restore the db and I backup the redo logs into it every 2 hours, just before it gets replicated to our disaster recovery site. Because only the redos changed, the async replication is very fast.
    Nice and easy, and the restore time in the DR site is not overly extensive – I can have it running there in <4 hours worst case – full restore.
    If it ever becomes needed I can have it restored and in standby, ready to be pushed forward with the redo logs every two hours. Sort of a poor man's Dataguard achieved with just SAN technology, release 7 standby db and a little bit of SRDF scripting and good old shell scripts. Works a treat! 😉
    
    Reply
Frank A. says:

2013-02-13 at 18:37

Is there an Oracle best practice on the number of redo log write commits per second that an oracle data base could/should process per second? I am seeing an Oracle DB that is being load tested and they are testing for an hour straight with a sustained 1000 redo log write commits per second with 17 log switches per hour. They are naturally experiencing latency. What are the Oracle best practices for log switches per hour and redo write commits per second?

Reply
1. Bart Sjerps says:
  
  2013-02-14 at 09:30
  
  Hi Frank,
  
  Just sent you a direct email… but quickly for the other readers:
  – Commits are done by the application and tuning a database does not change that
  – Make sure you do asynchronous I/O
  – Make sure writes are handled by storage cache not depending on physical disk writes (i.e. if write cache is full you’re in trouble)
  – With high commit rates like this you might want to increase the redo writer priority (or make it realtime)
  – Like mentioned in my post, prevent other I/O blocking the I/O queue for redo writes (i.e. make separate queues and eventually even physical channels just for redo)
  
  Please let me know if any of these guidelines helped or I missed something (which is not unlikely 😉
  
  Reply
singaravelan says:

2013-08-14 at 02:45

Hi Bart,

Thanks for such a clear explanation.

We generally create a separate Diskgroup in ASM to place the online redologs.
Need one clarification with regards to the number of LUNs to allocate for a dedicated REDO Diskgroup and how to size those LUNs.

We have multiple databases and the redolog file size varies depending on whether it is a OLT or DSS. For the OLTP databases we size them to be 100MB ot 200MB. On our DSS systems we size them to be 4GB or 8GB.

Note:- We are using EMC VMAX 40K with 4 engines.

Thanks and Regards,
Singaravelan

Reply
Bart Sjerps says:

2013-08-15 at 22:39

Hi Singaravelan,

Creating a separate diskgroup for redo is certainly a good thing like I explained. Remains the question: how many LUNs and how large…
Ask different people of EMC and you might get different answers. Mine is a personal view and by no means the ultimate truth – I will explain my recommendations but I recommend you contact your EMC system engineer to check his/her thoughts on the matter.

My recommendation is to keep LUNs for redo small. Why? I see no good reason to create diskgroups with lots of unused space. Besides, the smaller the LUN, the more chance is that when overwriting a redo file again you might get a cache “hit” i.e. you overwrite data that wasn’t even flushed to disk yet – resulting in less physical disk I/O.
But bigger LUNs are not a problem. If your organization standardized on a certain (large) LUN size then by all means stick to that instead of introducing a new standard size.

Another thing you should be aware of is that Oracle no longer creates fine-striped redo logs as of version 11.2. EMC’s senior database architect wrote a post on that on http://emc.com/everythingoracle :
https://community.emc.com/people/DarrylBSmith/blog/2012/01/06/where-did-that-fine-grained-striping-go

Anyway, for redo, LUN size does not matter too much. As long as flushing “dirty cache” to physical disk is fast enough to not fill up cache, you should have sub-millisecond resp times regardless.
Like mentioned in my post, I think it’s more important to have dedicated physical disks for redo (even if shared with redo from multiple dbs) than to worry about LUN sizes. That guarantees fast cache flushes not influenced by other DB activity such as table scans, TEMP IO and index reads.

How many LUNs? Enough to make sure there is little or no queueing. I’d say 4 is typically OK but with real high write intense workloads you might need more. Only (stress)testing will tell. (use SLOB – http://kevinclosson.wordpress.com/2012/02/06/introducing-slob-the-silly-little-oracle-benchmark/ if you need to work it out)

Hope this helps..
Regards
Bart

Reply
Ridwaan Kaka says:

2013-10-23 at 10:32

Hi

we are seeing log file syncs and at times the total system hangs for over 15 minutes.
we also notice the following in
warning: log write elapsed time 849ms, size 22037KB
Any ideas for us

Reply
1. Bart Sjerps says:
  
  2013-10-23 at 15:41
  
  Hi Ridwaan,
  
  Seems like you have a serious issue there. Not sure what the problem is but the numbers are way too high. Did you check the recommendations in my blogpost?
  
  Reply
Alexey says:

2014-02-27 at 13:31

Hi Bart!

Unfortunately these your recommendations (Create dedicated disk groups for REDO logs, Isolate competing resources and give dedicated resources, No – or limited – striping, and maybe use larger stripe sizes.) are not related enough to real life and real EMC arrays. It may be more related to host side, but not to storage.

I`m EMC employee. PS specialist. And our customers who implemented VMAX according these rules have performance problems. When we (PS) say to him how to do correctly he says: “Look at this blog, this is EMC guy, so I`m right”.

Creating dedicated disk groups should be avoided. When one group will gain performance of all disk in it, it will go to cache. When cache will filled up ALL LUNs of array will be affected. But not all disks will be overloaded and system at all will have enough resources to satisfy application needs. So best way is maximum consolidation of disk resources for faster parallel processing rather than disk splitting by groups.

Isolation of resources is possible at array cache level. And it is better to segregate cache by application rather than by LUNs purpose. Just because if you isolate DATA LUNs and it will be not enough performance for it, performance for REDO of the same application will no matter.

Stripping is good solution. All test shows that for general workloads two levels of stripe is better than one level or three levels. And one level of stripe is better than no stripping. And stripped metas is good solution to increase LUN`s queue depth and to increase parallelism in case of SRDF or local replication.
In case of thin provisioning environment we already have one level of stripping. So customer should avoid host stripping if he uses stripe metas. But not in case of replication. If one has replication (local or remote) stripped metas should be used instead of concatenated!
And there are a lot of cases when using stipe meta instead of just simple device of the same size is increasing performance just because of increased queue depth. Especially in case of replication.

And of course this is my personal opinion not EMC. 🙂 And please excuse me for my non-native English.

Reply
1. Bart Sjerps says:
  
  2014-02-27 at 15:52
  
  Hi Alexey,
  
  Few remarks:
  
  I just reviewed a soon to be published EMC white paper on extreme performance of Oracle on EMC. They did exactly what I am recommending (not because I say so but probably because it gives the best results): separate disk pool for REDO. So at least performance engineers agree with me on this. Why your customers have problems is hard to diagnose from your comment (feel free to drop me an email to further discuss – I’m easy to find on the internal EMC maillist 😉
  
  But a few thoughts (also for other readers):
  
  – Remember disk alignment
  – If you use SRDF (sync) or any other SAN based sync replication, then you need to stripe REDO logs at the host level with a fine stripe size. With ASM you need to have fine striping enabled. Otherwise SRDF will not be able to handle enough parallel writes and you create a bottleneck. If you’re not using SAN mirroring then it should not matter too much but there are different opinions on this one.
  – I’m not a fan of dedicated disk groups but REDO is a special case because a) Oracle is very sensitive for slow REDO response and b) REDO is 100% write sequential so it makes sense to separate it out to make it spinning-disk friendly
  – If your (write) cache fills up because of some other process (DWH load? large DB restore?) then REDO log might suffer as well
  – Did I mention disk alignment?
  – Have you reviewed the official EMC whitepapers and best practices guides? Or do you just trust my blogpost? 😉
  – Did you set up the databases as described in those papers?
  – Are you sure the infrastructure was sized for the workload? If your system is undersized you can mess with striping and other stuff all you want but it will never work for you.
  
  You suggest segregating by application. I’ve written a blogpost on that before basically saying it is not very wise to pull the handbrakes (cache prioritizing) on your expensive Ferrari (Oracle) because another application has priority. Better is to provide enough IOPS and bandwidth.
  
  Last but not least: If you have performance problems then find the bottleneck (root cause). If you haven’t found it then any comment on what configuration setting or striping strategy is better over another does not make sense – you might have an undiscovered problem somewhere else in the stack.
  Sounds like a no-brainer but I have literally visited customers who were complaining heavily about performance issues, say that they have been tuning for over a year and still couldn’t get the system to perform as expected- then I asked basic stuff like total peak IOPS numbers and read/write ratio and they didn’t have a clue… go figure.
  Again if you want me to help out, drop me an email – and tell me where the bottleneck is, see if we can fix it.
  
  Reply
  1. Alexey says:
    
    2014-02-28 at 11:35
    
    Hi Bart, it`s me again. 🙂
    
    Please provide the name of white paper. Because if I can`t read this stuff (just because I don`t actually know what document you are talking about) I can`t say anything about it. And everyone can say that “they did exactly what I say”.
    
    Now thoughts.
    
    Alignment can be a problem only in the Linux world. Not UNIX or modern Windows. But it is already excluded in my case.
    
    Stripping at the host level is better than array level, of course. But Oracle admins prefer to get 10 LUNs of 1TB rather than 100 LUNs of 100GB. So storage admins have no choice other than to implement stripping or concatenation (don`t forget about 240 GB limit of SYMMETRIX device) on array side.
    And, one more times, stripped meta is much more better than concatenated meta in case of any kind of array replication. SRDF and TF have separated streams for each meta member of stripped meta.
    
    About replication at all. You said about SRDF/A. And what about zero RPO?
    
    Separated disk group? No way! Separated LUNs? Yes, of course.
    Storage admins don`t want to separate REDO at disk level. Because they lose space. For example, for separated 8 REDO LUNs of 10 GB each we use 8 physical disks by performance reason. But disks now have size of 450GB and more. So in RAID1 case there is 8 * 400 / 2 = 1600 GB of available space. But used space is only 8 * 10 = 80 GB. Effective capacity is only 5% (excluding RAID overhead). Are you kidding?
    
    A sequential is going to be a random on the array side when there is host level stripping. Thin Provisioning on VMAX also influences on modification of sequential host steam to random stream at back-end level.
    
    A little bit about cache. I`m talking about of array cache not host. VMAX can be very big and customers prefer to consolidate different application on it. One array is easily to manage than ten. So what does production do if backup application or test/development application take all storage resources? VAMX has a lot of possibilities to isolate or limit some application to give maximum performance to another one. And Cache Partitioning is one of ways to do this.
    Our customer wants to create cache partitions for REDO, DATA and other after reading your blog (resource separation idea). It`s not good.
    
    May be there is some misunderstanding. Our customer is believe to your blog more than Professional Service engineers and white papers.
    
    Sizing. It is an ideal world when you can size the storage correctly. But it`s all about money. Not all customers have enough money to buy correctly sized array. And on customer side there are situation when an array is bought for one task but it is used for another.
    And there can be a difference between sized and configured. Just because admin decided.
    
    About my customer. We know the root cause of performance problem. We know how to solve. But customer doesn`t want to implement recommendations just because of your blog. For example, he overloads SATA disk instead of using FAST VP because of segregation of LUNs by function. Or, because you are saying “no stripping”, all meta are configured as concatenated. It`s bad for his SRDF implementation.
    
    The main thing that I want to say is each customer, each application, each particular case have it`s own recommendations. There is no silver bullet for all. And each recommendation must have described explanations and restrictions.
    
    Reply
    1. Bart Sjerps says:
      
      2014-03-05 at 16:51
      
      Drop me an email and we will look further into it.
      
      Reply