Another Hyper-V Implementation Mistake –Too Many CSVs

In the PowerPoint that I posted yesterday, I mentioned that you should not go overboard with creating CSVs (Cluster Shared Volumes).  In the last two weeks, I’ve heard of several people who have.  I’m not going to play blame game.  Let’s dig into the technical side of things and figure out what should be done.

In Windows Server 2008 Hyper-V clustering, we did not have a shared disk mechanism like CSV.  Every disk in the cluster was single owner/operator.  Realistically (and required by VMM 2008) we had to have 1 LUN/cluster disk for each VM.

That went away with CSV in Windows Server 2008 R2.  We can size our storage (IOPS from MAP) and plan our storage (DR replication, backup policy, fault tolerance) accordingly.  The result is you can have lots of VMs and virtual hard disks (VHDs) on a single LUN.  But for some reason, some people are still putting 1 VM, and even 1 VHD, on a CSV.

An example: someone is worried about disk performance and they spread the VHDs of a single VM across 3 CSVs on the SAN.  What does that gain them?  In reality: nothing.  It actually is a negative.  Let’s look at the first issue:

SAN Disk Grouping is not like Your Daddy’s Server Storage

If you read some of the product guidance on big software publisher’s support site, you can tell that there is still some confusion out there.  I’m going to use HP EVA lingo because it’s what I know.

If I had a server with internal disks, and wanted to create three RAID 10 LUNs, then I would need 6 disks.

image

The first pair would be grouped together to make LUN1 at a desired RAID level.  The second pair would be grouped together to make the second LUN, and so on.  This means that LUN1 is on a completely separate set of spindles to LUN2 and LUN3.  They may or may not share a storage controller.

A lot of software documentation assumes that this is the sort of storage that you’ll be using.  But that’s not the case with a cluster with a hardware SAN. You need to use the storage it provides, and it’s usually nothing like the storage in a server.

By the way, I’m really happy that Hans Vredevoort is away on vacation and probably will miss this post.  He’d pick it to shreds Smile

Things are kind of reversed.  You start off by creating a disk group (HP lingo!)  This is a set of disks that will work as a team, and there is often a minimum number required.

image

From there you will create a virtual disk (not a VHD – it’s HP lingo for a LUN in this type of environment).  This is the LUN that you want to create your CSV volume on.  The interesting thing is that each virtual disk in the disk group spans every disk in the disk group.  How that spanning is done depends on the desired RAID level.  RAID 10 will stripe using pairs of disks, and RAID5 will stripe using all of the disks.  That gives you the usual expected performance hit/benefits of those RAID levels and the expected available amount of data.

In the below, you can see two virtual disks (LUNs) have been created in the disk group.  The benefit of this approach is that the virtual disks can benefit by having many more spindles to use.  The sales pitch is that you are getting much better performance than the alternative server internal storage.  Compare LUN1 from above (2 spindles) with vDisk1 below (6 spindles).  More spindles = more speed.

I did say it was sales pitch.  You’ve got other factors like SAN latency, controller cache/latency, vDisks competing for disk I/O, etc. But most often, the sales pitch holds fairly true.

image

If you think about it, a CSV spread across a lot of disk spindles will have a lot of horsepower.  It should provide excellent storage performance for a VM with multiple VHDs.

A MAP assessment is critical.  I’ve also pointed out in that PowerPoint that customers/implementers are not doing this.  This is the only true way to plan storage and decide between VHD or passthrough disk.  Gut feeling, “experience”, “knowledge of your network” are a bunch of BS.  If I hear someone saying “I just know I need multiple physical disks or passthrough disks” then my BS-ometer starts sending alerts to OpsMgr – can anyone write that management pack for me?

Long story short: a CSV on a SAN with this type of storage offers a lot of I/O horsepower.  Don’t think old school because that’s how you’ve always thought.  Run a MAP assessment to figure out what you really need.

Persistent Reservations

Windows Server 2008 and 2008 R2 Failover Clustering use iSCSI3 persistent reservations (PRs) to access storage.  Each SAN solution has a limit on how many PRs they can support.  You can roughly calculate what you need using:

PRs = Number of Hosts * Number of Storage * Channels per Host Number of CSVs

Let’s do an example.  We have 2 hosts, with 2 iSCSI connections each, with 4 CSVs.  That works out as:

2 [hosts] * 2 [channels] * 4 [CSVs] = 16 PRs

OK; Things get more complicated with some storage solutions, especially modular ones.  Here you really need to consult an expert (and I don’t mean Honest Bob who once sold you a couple of PCs at a nice price).  The key piece may end up being the number of storage channels.  For example, each host may have 2 iSCSI channels, but it maintains connections to each module in the SAN.

Here’s another example.  There is an iSCSI SAN with 2 storage modules.  Once again, we have 2 hosts, with 2 iSCSI connections each, with 4 CSVs.  This now works out as:

2 [hosts] * 4 [channels –> 2 modules * 2 iSCSI connections] * 4 [CSVs] = 32 PRs

Add 2 more storage modules and double the number of CSVs to 8 and suddenly:

2 [hosts] * 8 [channels –> 4 modules * 2 iSCSI connections] * 8 [CSVs] = 128 PRs

Your storage solution may actually calculate PRs using a formula with higher demands.  But the question is: how many PRs can your storage solution handle?  Deploy too many CSVs and/or storage modules and you may find that you have disks disappearing from your cluster.  And that leads to very bad circumstances.

You may find that a storage firmware update increases the number of required PRs.  But eventually you reach a limit that is set by the storage manufacturer.  They obviously cripple the firmware to create a reason to buy the next higher up model.  But that’s not something you want to hear after spending €50K or €100K on a new SAN.

They way to limit your PR requirement is to deploy only the CSVs you need.

Undoing The Damage

If you find yourself in the situation with way too many CSVs then you can use SCVMM Quick Storage Migration to move VMs onto fewer, larger CSVs, and then remove the empty CSVs.

Recommendations

Slow down to hurry up.  You MUST run an assessment of your pre-virtual environment to understand what storage you buy.  You also use this data as a factor for planning CSV design and virtual machine/VHD placement.  Like my old woodwork teacher used to say: “measure twice and cut once”.

Take that performance requirement information and combine it with backup policy (1 CSV backup policy = 1 or more CSVs, 2 CSV backup policies = 2 or more CSVs, etc), fault tolerance (place clustered or load balanced VMs on different CSVs), and DR policy (different storage level VM replication policies requires different CSVs).

15 thoughts on “Another Hyper-V Implementation Mistake –Too Many CSVs”

  1. Very good article. We are in the planning stages our own new Hyper-V virtualization project for organization, and this was helpful. I am wondering how node growth should be planned when considering clustered shared volumes, since each host owns/coordinates each csv. For example if you start out with 2 hosts, and 4 csv, and then add an additional host to the cluster, can it access the same csv’s that are already defined and host its own vms?

    1. Steve, each CSV has 1 CSV coordinator. It delegates rights to each other host in the cluster so they can run VMs that are stored on that host. The thing you do have to watch out for when you add hosts/CSVs is the usage of persistent reservations and the limit on how many of those PRs that the SAN can handle.

  2. Disk IO is so important to any VM implementation. Without access to large anc complex Storage Networks I have only really been able to use server host Direct Attach storage.
    Many time I “Feel” (unscientific I know) that seperate LUNS do give me better performance. As, thought the LUN configured with significant disk numbers is able to access multiple disks, it is still being accessed by multiple VM. So one advantage negates the other.
    Surely for best IO with SQL databases in mind and rapid access to data in a smaller business environment it is best to just allocate physical disks, assign them to the LUN for each VM.

    1. You’ll gain maybe 2% over the speed of fixed VHD, and lose access to features like easy host/storage level backup, Live Storage Migration (and Shared Nothing Live Migration), Hyper-V Replica, and on … and on … and on.

      Using passthrough disk = FAIL in my opinion. #2 reason we virtualise (and stats prove it) is flexibility. Passthrough disks (raw device mapping) are not flexible.

      Just follow my advice.

  3. Hi there,

    Must say, this whole concept of CSV’s can be quite confusing for a DBA…

    So if I was consolidating a number of physical sql servers, are you saying that if I size a single csv correctly from an IOPS point of view that I can be serviced just as well as when my physical servers had multiple luns to them selves? Also are we mixing log and data files from multiple vm’s on the same csv (which have different IO patterns) or would you recommend creating a log and data file csv and then splitting the files for each vm accordingly?

    1. I’m not a SQL Guy so I don’t know the traits of its behaviour. But … having 1 LUN made of the same number/type of physical disks as 10 LUNs in the same disk group should give the same performance. The spindles are spinning the same way, and the heads are making the same hits, no matter the number of LUNs. One could argue that 1 LUN might be more efficient.

      1. Beware of the differences between average IOPS, average throughput, peak IOPS, peak throughput, interaction effects, and your business requirements.

        In short, if you have minimum response time service level requirements (i.e. must respond within X milliseconds, or a big report that normally takes 7 seconds must respond within 10 seconds), you should beware of Setup1, since the other VM’s sharing the physical disks can degrade your own VM’s performance.

        As an example, if we have:
        Setup1: 12 physical disks, 1 RAIDgroup, multiple LUNS, 3 VM’s all ending up sharing the same 12 physical disks
        Setup2: 12 physical disks, 3 RAIDgroups, no shared physical disks between RAIDgroups, and 3 VM’s which in reality have dedicated physical disks

        Then for your single example virtual SQL Server:

        Setup1 will give much higher potential peak throughput (unless bandwidth limited, which you usually are in a big SAN environment), and much higher potential peak IOPS. However, all three VM’s have the same peak for both – someone running large random (or sequential) operations on the other two VM’s will drop your SQL Server’s actual current peak throughput and IOPS drastically, and it’s difficult to diagnose, showing up as intermittent high latency and/or low throughput. Thus, while your “theoretical average” is the same as Setup2, your actual performance will vary between much lower and much higher based on what the other OS’s sharing the same physical disks are doing.

        Setup2 isolates the physical disks. You’ll have much lower peak IOPS and throughput, a similar “theoretical” average IOPS and throughput, but much more controlled and predictable performance. The two other VM’s will mainly affect SQL Server performance only if the aggregate throughput exceeds the bandwidth of any one segment of the storage path (highly likely if you have 1Gbps iSCSI, likely in big sequential data ops (backups, restores, some ETL, some big aggregate reporting/data warehouse work, some maintenance ops) in even an 8Gbps FC or 10Gbps iSCSI environment.

        Setup2 wil

  4. Hi Aidan,
    Thanks for this good article, but I some questions that hopefully you can address.
    Our SAN has a 2TB limitation for iSCSI connection, which means that our CSV has a 2TB cap.
    We have a total of 8 LUNs.
    Is there a workaround to get the Windows 2012 R2 Hyper-V Replica to point to multiple CSV?
    Or is the only solution is to create 1 large 20TB LUN?

    Thanks for any feedback on my questions.

    1. The only solution I have seen so far is to create 1 target LUN per source. Good question though; I’ll ask the product group.

  5. Hi Aidan,

    Nice article. Regarding this statement in the article….
    “Windows Server 2008 and 2008 R2 Failover Clustering use iSCSI3 persistent reservations (PRs) to access storage. Each SAN solution has a limit on how many PRs they can support.”

    Does this apply to an FC SAN?

    1. Yes. Typo by me. It should read SCSI3. Note that WS2012 completely changed the PR usage so its MUCH more scalable.

  6. Your article states “If I had a server with internal disks, and wanted to create three RAID 10 LUNs, then I would need 6 disks.” I don’t believe this is correct. It would take 6 disks for RAID1. RAID10 requires a minimum of 4 disks for a LUN as it is striped mirrors. In your example you’d need 12 disks, not six.

  7. Hi Aidan,

    I have one question regarding the map toolkit I’m administrating a hyper-v cluster with VMM and this environment is a clusterfuck since the guys who implemented this did everything wrong, I’m just trying to correct all this and at the moment I have done all this successfully, I’m right now on the last part that is updating the SCVMM and sorting the VM’s and the CSV but I don’t know how applicable is the Map Assessment in a environment that is already in production I’m asking this because I want to organize the VM and CSV and I don’t know many VM I need to add per CSV-Lun

    Thanks.

    1. MAP won’t help much. There is no limit of VM’s per LUN. You’ll have to do good old fashioned digging for info, comparing against LUN potential, etc.

Leave a Reply to E Hill Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.