Why I Dislike Dynamic VHD in Production

With this post, I’m going to try explain why I recommend against using Dynamic VHD in production.

What is Dynamic VHD?

There are two types of VHD you may use in production:

  • Fixed: This is where all of the allocated storage is consumed at once.  For example, if you want 60 GB of virtual machine storage, a VHD file of around 60 GB is created, consuming all of that storage at once.
  • Dynamic: This is where the VHD will only consume as much as is required, plus a little buffer space.  If you allocate 60 GB of storage, a tiny VHD is created.  It will grow by small chunks to accommodate new data, always leaving a small amount of free space.  It kind of works like a SQL Server database/log file.  Eventually the VHD will reach 60 GB and you’ll run out of space in the virtual disk.

With Windows Server 2008 we knew that Dynamic VHD was just too slow for production.  The VHD would grow in very small amounts, and often lots of growth was required at once, creating storage write latency.

Windows Server 2008 R2

We were told that was all fixed when Windows Server 2008 R2 was announced.  Trustworthy names stood in front of large crowds and told us how Dynamic VHD would nearly match Fixed VHD in performance.  The solution was to increase the size of the chunks that were added to the Dynamic VHD.  After RTM there were performance reports that showed us how good Dynamic VHD was.  And sure enough, this was all true … in the perfect, clean, short-lived, lab.

For now, lets assume that the W008 R2 Dynamic VHD can grow fast enough to meet write activity demand, and focus on the other performance negatives.

Fragmentation

Let’s imagine a CSV with 2 Dynamic VHDs on it.  Both start out as small files:

image

Over time, both VHDs will grow.  Notice that the growth is fragmenting the VHDs.  That’s going to impact reads and overwrites.

image

And over the long term, it doesn’t get any better.

image

Now imagine that with dozens of VMs, all with one or more Dynamic VHDs, all getting fragmented.

The only thing you can do to combat this is to run a defrag operation on the CSV volume.  Realistically, you’d have to run the defrag at least once per day. Defrag is an example of an operation that’s going to kick in Redirected Mode (or Redirected Access).  And unlike backup, it cannot make use of a Hardware VSS Provider to limit the impact of that operation.  Big and busy CSVs will take quite a while to defrag, and you’re going to impact on the performance of production systems.  And you really need to be aware of what that impact would be on multi-site clusters, especially those that are active(site)-active(site).

Odds are you probably should be doing the occasional CSV defrag even if you use Fixed VHD.  Stuff gets messed up over time on any file system.

Storage Controllers

I am not a storage expert.  But I talked with some Hyper-V engineers yesterday who are.  They told me that they’re seeing SAN storage controllers that really aren’t dealing well with Dynamic VHD, especially if LUN thin provisioning is enabled.  Storage operations are being queued up, leading to latency issues.  Sure, Dynamic VHD and thin provisioning may reduce the amount of disk you need, but at what cost to the performance/stability of your LOB applications, operations, and processes?

CSV and Dynamic VHD

I became aware of this one a while back thanks to my fellow Hyper-V MVPs.  It never occurred to me at all – but it does make sense.

In scenario 1 (below) the CSV1 coordinator role is on Host1.  A VM is running on Host1, and it has Dynamic VHDs on CSV1.  When that Dynamic VHD needs to expand, Host1 can take care of it without any fuss.

image

In scenario 2 (below) things are a little different.  The CSV1 coordinator role is still on Host1, but the VM is now on Host3.  Now when the Dynamic VHD needs to expand, we see something different happen.

image

Redirected Mode/Access kicks in so the CSV coordinator (Host1) for CSV1 can expand the Dynamic VHD of the VM running on Host3.  That means all storage operations for that CSV, on Hosts2-3 must travese the CSV network (maybe 1 Gbps) to Host1, and then go through its iSCSI or fibre channel link.  This may be a very brief operation, but it’s still something that has a cumulative effect on latency, with potential storage I/O bottlenecks in the CSV network, Host1, Host1 HBA, or Host1 SAN connection.

image

Now take a moment to think bigger:

  • Imagine lots of VMs, all with Dynamic VHDs, all growing at once.  Will the CSV ever not be in Redirected Mode? 
  • Now imagine there are lots of CSVs with lots of Dynamic VHDs on each.
  • When you’re done with that, now imagine that this is a multi-site cluster with a WAN connection adding bandwidth and latency limitations for Redirected Mode/Access storage I/O traffic from the cluster nodes to the CSV coordinator.
  • And then imagine that you’re using something like a HP P4000/LeftHand where each host must write to each node in the storage cluster, and that redirected storage traffic is going back across that WAN link!

Is your mind boggled yet?  OK, now add in the usual backup operations, and defrag operations (to handle Dynamic VHD fragmentation) into that thought!

You could try to keep the VMs on CSV1 running on Host1.  That’ll eliminate the need for Redirected Mode.  But things like PRO, and Dynamic Optimization of SCVMM 2012 will play havoc with that, moving VMs all over the place if they are enabled – and I’d argue that they should be enabled because they increase service uptime, reliability, and performance.

We need an alternative!

Sometimes Mentioned Solution

I’ve seen some say that they use Fixed VHD for data drives where there will be the most impact.  That’s a good start, but I’d argue that you need to think about those System VHDs (the ones with the OS).  Those VMs will get patched. Odds are that will happen at the same time and you could have a sustained level of Redirected Mode while Dynamic VHDs expand to handle the new files.  And think of the fragmentation!  Applications will be installed/upgraded, often during production hours.  And what about Dynamic Memory?  The VMs paging file will increase, thus expanding the size of the VHD: more Redirected I/O and fragmentation.  Fixed VHD seems to be the way to go for me.

My Experience

Not long after the release of Windows Server 2008 R2, a friend of mine deployed a Hyper-V cluster for a business here in Ireland.  They had a LOB application based on SQL Server.  The performance of that application went through the floor.  After some analysis, it was found that the W2008 R2 Dynamic VHDs were to blame.  They were converted to Fixed VHD and the problem went away.

I also went through a similar thing in a hosting environment.  A customer complained about poor performance of a SQL VM.  This was for read activity – fragmentation would cause the disk heads to bounce and increase latency.  I converted the VHDs to fixed and the run time for reports was immediately improved by 25%.

SCVMM Doesn’t Help

I love the role of the library in SCVMM. It makes life so much easier when it comes to deploying VMs, and SCVMM 2012 expands that exponentially with the deployment of a service.

If you are running a larger environment, or a public/private cloud, with SCVMM then you will need to maintain a large number of VM templates (VHDs in MSFT lingo but the rest of the world has been calling them templates for quite a long time). You may have Windows Server 2008 R2 with SP1 Datacenter, Enterprise, and Standard. You may have Windows Server 2008 R2 Datacenter, Enterprise, and Standard. You may have W2008 with SP1 x64 Datacenter, Enterprise, and Standard. You may have W2008 with SP1 x86 Datacenter, Enterprise, and Standard. You get the idea. Lots of VHDs.

Now you get that I prefer Fixed VHDs.  If I build a VM with Fixed VHD and then create a template from it, then I’m going to eat up disk space in the library.  Now it appears that some believe that disk is cheap.  Yes, I can get 1TB of a disk for €80.  But that’s a dumb, slow, USB 2.0 drive.  That’s not exactly the sort of thing I’d use for my SCVMM library, let alone put in a server or a datacenter.  Server/SAN storage is expensive, and it’s hard to justify 40 GB + for each template that I’ll store in the library.

The alternative is to store Dynamic VHDs in the library.  But SCVMM does not convert them to Fixed VHD on deployment.  That’s a manual process – and that’s one that is not suitable for the self-service nature of a cloud.  The same applies to storing a VM in the library; it seems pointless to store Fixed VHDs for an offline VM, but there’s a manual conversion process to convert the stored VMs to Dynamic VHD.

It seems to me that:

  • If you’re running a cloud then you realistically have to use Fixed VHDs for your library templates (library VHDs in Microsoft lingo)
  • If you’re a traditional IT-centric deploy/manage environment, then store Dynamic VHD templates, deploy the VM, and then convert from Dynamic VHD to Fixed VHD before you power up the VM.

What Do The Microsoft Product Groups Say?

Exchange: “Virtual disks that dynamically expand are not supported by Exchange”.

Dynamics CRM: “Create separate fixed-size virtual disks for Microsoft Dynamics CRM databases and log files”.

SQL Server: "Dynamic VHDs are not recommended for performance reasons”.

That seems to cover most of the foundations for LOB applications in a MSFT centric network.

Recommendation

Don’t use Dynamic VHD in production environments.  Use Fixed VHD instead (and passthrough in those rare occasions where required).  Yes, you will use more disk for Fixed VHD for all that white space, but you’ll get the best possible performance while using flexible and more manageable virtual disks. 

If you have implemented Dynamic VHD:

  • Convert to Fixed VHD (requires VM shut down) if you can. Defrag, and set up a less frequent defrag job.
  • If you cannot convert, then figure out when you can run frequent defrag jobs.  Try to control VM placement relative to CSV coordinator roles to minimize impact.  The script will need to figure out the CSV coordinator for the relevant CSV (because it can failover), and Live Migrate VMs on that CSV to the CSV coordinator, assuming that there is sufficient resource and performance capacity on that host.  Yes, the Fixed VHD option looks much more attractive!

WPC11 Hyper-V Announcements and Some Brainfarts

WPC is Microsoft’s conference for partners.  The delegates tend to be executives or account managers from the Microsoft partner community, and the content is not the usual technical level one should expect from an MMS or TechEd.

Yesterday Microsoft announced some Hyper-V features from Windows 8 (Windows Server 2012?).  The first was that Hyper-V “3.0” will support “more than” 16 vCPUs per VM.  That’s a nice add on for those larger VM’s, giving us 16+ simultaneous threads of execution.  People are virtualising larger workloads as well as the usual/expected lighter ones because virtualisation offers solutions to more than just power/rack consolidation, e.g. fault tolerance.  A bottleneck has been the ability to run larger multi-threaded workloads, and Windows Server “2012” Hyper-V will give us a potential solution for this.

One of the big reasons we adopt virtualisation is the ability to make DR (disaster recovery or business continuity) easier.  Mid-to-enterprise businesses can afford really expensive SAN/WAN solutions for this.  There’s a number of storage or backup replication solutions that can allow replication of virtual workloads over smaller lines for the small-medium enterprise (SME).  Some are good, and some are downright rubbish to the point of being dangerous.

Microsoft is stepping in with a new feature called Hyper-V Replica.  This will give us the ability to replicate VMs asynchronously.  This means it will work over longer distances, with lower capacity lines, higher latency, and will be cheaper.  It also means that there is a slight delay in replication of VM data.  That’s unacceptable to a small set of the market who have regulatory/business needs for synchronous replication and will have to continue to look at those third party or expensive SAN/WAN replication solutions.

Thinking about Hyper-V Replica makes me wonder if there are other new features/upgrades that we haven’t been told about yet.  This isn’t DFS-R as we know it.  DFS-R requires a file to be closed before it can analyse it and replicate the change blocks.  Maybe we have a new DFS-R but I’m sceptical of that.  Maybe we have a “new” transactional file system?  I say “new” because Microsoft has had a transactional file system for quite some time in the form of WinFS.  This would allow the file system to track changes, and replicate them, all while keeping VMs in a consistent state in source and destination locations.  Consistency is one of those things that has worried me in third party software based replication of VMs because they are unaware of things like in-VM database commits.  Maybe a new WinFS could be aware?  Potentially it could work in cluster-cluster replication (no mention of that in the reports I read this morning from WPC).

Good news: Hyper-V Replica will be a built-in feature with no extra charges, unlike something else we could mention Winking smile

I think that’s enough hot air and methane blown into the atmosphere for today.

Exchange 2010 SP1 Virtualisation and Live Migration

A few weeks ago social media lit up with news of support for running Exchange 2010 DAG members on virtualised clusters, e.g. a vSphere farm or a Hyper-V failover cluster.  That much is true.  Some of the chatter implied that Live Migration was supported.

I’ve downloaded and started reading a Microsoft whitepaper called Best Practices for Virtualizing Exchange Server 2010 with Windows Server 2008 R2 Hyper V.

On page 15, under the heading of Live Migration you can see:

Exchange server virtual machines, including Exchange Mailbox virtual machines that are part of a Database Availability Group (DAG), can be combined with host-based failover clustering and migration technology as long as the virtual machines are configured such that they will not save and restore state on disk when moved or taken offline.

OK.  That says to me that Live Migration = good and Quick Migration = bad.  That’s fine with me.

Now move on to page 28 to the heading of Hyper-V Failover Clustering.

“All failover activity must result in a cold start when the virtual machine is activated on the target node. All planned migration must either result in shut down and a cold start or an online migration that utilizes a technology such as Hyper-V live migration”.

That confirms it.  Quick Migration was a process where a VM’s state was written to disk, the VM resource was moved to a target node, and the state was read from disk to start the VM.  The Exchange product group do not like that.  One might be forgiven for thinking that Quick Migration was a thing of the past but there are scenarios where one can build a Hyper-V failover cluster using software based replication solutions over sub 1 Gbps WAN connections.  They still use Quick Migration.

If you are in that scenario then you need to be aware that a DAG member will be evicted if it is offline for 5 seconds.  See page 29 for some instructions and PowerShell cmdlets for that situation.

On the other hand, the above text confirms that Live Migration is fine for DAG members running Exchange 2010 SP1 (the text that follows in the document specifies that version).  Interestingly, the Exchange and Hyper-V groups found that CSV was much better than passthrough disks (page 29) in a Hyper-V failover cluster.  Page 29 gives a bunch of guidance on things like bandwidth, jump frames, and receive side buffer.

YES! Exchange 2010 SP1 DAG Supported on Highly Available Virtualisation

As of today:

“Combining Exchange 2010 high availability solutions (database availability groups (DAGs)) with hypervisor-based clustering, high availability, or migration solutions that will move or automatically failover mailbox servers that are members of a DAG between clustered root servers, is now supported”.

In other words, The Exchange team heard us, and they’ve added support to install DAG members (on Exchange 2010 SP1) on a highly available virtualisation cluster.  That will simplify many virtualised Exchange installations.

Also:

“The Unified Messaging server role is supported in a virtualized environment”.

Slide deck – Private Cloud Academy: Backup and DPM 2010

Here’s the slide deck I presented at the Microsoft Ireland/System Dynamics Private Cloud Academy event on how to design Hyper-V cluster shared volumes (CSV) for backup and use System Center Data Protection Manager (DPM) 2010 to backup virtualised workloads.  Like the previous sessions, it was a very demo-centric 3 hour event.

Updated: How to Build a Hyper-V Cluster Using the Microsoft iSCSI Software Target v3.3 – MPIO

My whitepaper on How to Build a Hyper-V Cluster Using the Microsoft iSCSI Software Target v3.3 proved to be popular, getting over 500 downloads, thanks to many of you linking, retweeting, and so on.

At the time, a TechNet page stated that MPIO would not be supported with iSCSI initiators that were members of a failover cluster.  I quoted that page, and I excluded MPIO from the setup.  This revelation disappointed a lot of people. 

Hans Vredevoort (clustering MVP) contacted some of the storage folks in Microsoft to discuss the MPIO/cluster member initiators issue. It turns out that the Microsoft page in question was incorrect. It used to be true, but the v3.3 Software Target does support iSCSI initiators that are members of a cluster. The document has been updated with this note, but I have not added configuration steps for MPIO.

KB2531907: Validate SCSI Device Vital Product Data (VPD) test fails after you install W2008 R2 SP1

This one is a continuation on yesterday’s post.  Microsoft did post KB2531907 on the net – and that’s a good thing.  I’d recommend this patch becomes a part of your standard build for Windows Server 2008 R2 Service Pack 1 failover clusters.  Test before you deploy.

“Consider the following scenario:

  • You configure a failover cluster that has three or more nodes that are running Windows Server 2008 R2 Service Pack 1 (SP1).
  • You have cluster disks that are configured in groups other than the Available Storage group or that are used for Cluster Shared Volumes (CSV).
  • These disks are online when you run the Validate SCSI Device Vital Product Data (VPD) test or the List Potential Cluster Disks storage validation test.

In this scenario, the Validate SCSI Device Vital Product Data (VPD) test fails. Additionally, you receive an error message that resembles the following:

 

Failed to get SCSI page 83h VPD descriptors for cluster disk <number> from <node name> status 2

The List Potential Cluster Disks storage validation test may display a warning message that resembles the following:

Disk with identifier <value> has a Persistent Reservation on it. The disk might be part of some other cluster. Removing the disk from validation set.

The following hotfix resolves an issue in which the storage test runs on disks that are online and incorrectly not in the Available Storage group.

The error and warning messages that are mentioned in the “Symptoms” section may also occur because of other issues such as storage problems or an incorrect configuration. Therefore, you should investigate other events, check the storage configuration, or contact your storage vendor if this issue still occurs after you install the following hotfix”.

Having Cluster Validation Issues After Upgrading to W2008 R2 SP1?

It’s been a bit of a hot topic on TechNet: people who upgraded to Windows Server 2008 R2 Service Pack 1 on 3+ node clusters started having issues with cluster validation.  Before the upgrade there was no issue.  Didier Van Hoye (follow him!) pinged me to alert me to a new KB (KB2531907) that should be out today to fix the issue.  Eldin Christensen (one of the seniors behind Failover Clustering) posted on the TechNet forums to alert us.

In the post, Eldin says:

“A hotfix is now available that addresses the Win2008 R2 service pack 1 issue with Validate on a 3+ node cluster. This is KB 2531907. The KB article and download link will be published shortly, in the mean time you can obtain this hotfix immediately free of charge by calling Microsoft support and referencing KB 2531907”.

It’s a pity that you cannot just download it like other publicly available KBs.  This is an issue that will cause support issues if you call MS CSS with other clustering problems; remember that CSS supports clusters that pass the validation test.

BTW, this linked article by Didier includes some more fixes to be aware of for W2008 R2 SP1 clusters.

EDIT: Microsoft did post the hotfix online so you can download it.

Recent KB Articles Affecting Hyper-V, Etc

Here’s a few KB articles I found that were released by Microsoft recently that affect Hyper-V farms.

KB2004712: Unable to backup Live Virtual Machines in Server 2008 R2 Hyper-V

“When backing up online Virtual Machines (VMs) using Windows Server Backup or Data Protection Manager 2007 SP1, the backup of the individual Virtual Machine may fail with the following error in the hyperv_vmms Event Log:

No snapshots to revert were found for virtual machine ‘VMName’. (Virtual machine ID 1CA5637E-6922-44F7-B17A-B8772D87B4CF)”.

VM with GPT pass through disk on a Hyper-V cluster with SAS based storage array will cause VM to report “Unsupported Cluster Configuration.”

“When you attach a GPT pass-through disk provided from SAS storage (Serial attached SCSI) array to a highly available virtual machine by using the Hyper-V Manager or Failover Cluster Management Microsoft Management Console (MMC) snap-in, the System Center Virtual Machine Manager 2008 Admin Console lists the status of the virtual machine as "Unsupported Cluster Configuration."

Details on the High Availability section of the VMs Properties in SCVMM are:

Highly available virtual machine <Machinename> is not supported by VMM because the VM uses non-clustered storage. Ensure that all of the files and pass-through disks belonging to the VM reside on highly available storage”.

On a computer with more than 64 Logical processors, you may experience random crashes or hangs

“On a computer which has more than 64 logical processors, you may experience random memory corruption during boot processing. This may result in system instability such as random crashes or hangs.

This problem occurs due to a code defect in the NDIS driver (ndis.sys).

Microsoft is currently investigating this problem, and will post more details when a fix is available.

To work around this issue, reduce the number of processors so that the system has no more than 64 logical processors. For example, disable hyper-threading on the processors”.

The network connection of a running Hyper-V virtual machine may be lost under heavy outgoing network traffic on a computer that is running Windows Server 2008 R2 SP1

“Consider the following scenario:

  • You install the Hyper-V role on a computer that is running Windows Server 2008 R2 Service Pack 1 (SP1).
  • You run a virtual machine on the computer.
  • You use a network adapter on the virtual machine to access a network.
  • You establish many concurrent network connections. Or, there is heavy outgoing network traffic.

In this scenario, the network connection on the virtual machine may be lost. Additionally, the network adapter may be disabled”.

A hotfix is available to let you configure a cluster node that does not have quorum votes in Windows Server 2008 and in Windows Server 2008 R2

“Windows Server Failover Clustering (WSFC) uses a majority of votes to establish a quorum for determining cluster membership. Votes are assigned to nodes in the cluster or to a witness that is either a disk or a file share witness. You can use the Configure Cluster Quorum Wizard to configure the clusters quorum model. When you configure a Node Majority, Node and Disk Majority, or Node and File Share Majority quorum model, all nodes in the cluster are each assigned one vote. WSFC does not let you select the cluster nodes that vote for determining quorum.

After you apply the following hotfix, you can select which nodes vote. This functionality improves multi-site clusters.  For example, you may want one site to have more votes than other sites in a disaster recovery. Without the following hotfix, you have to plan the numbers physical servers that are deployed to distribute the number of votes that you want for each site.”

Whitepaper: How to Build a Hyper-V Cluster Using the Microsoft iSCSI Software Target v3.3

I’ve just uploaded a step-by-step guide on how to build a Hyper-V cluster for a small production or lab environment, using the Microsoft iSCSI Software Target v3.3.  This target is a free add-on for Windows Server 2008 R2 and is included with Windows Storage Server 2008 R2.  it goes through all the steps:

  • Installing and configuring the storage
  • Building a standalone host to run System Center VMs
  • Building a 2 node Hyper-V cluster

“The Microsoft iSCSI Software Target is a free iSCSI storage solution. It is included as a part of Windows Storage Server 2008 R2, and it is a free download for Windows Server 2008 R2. This allows a Windows Server to become a shared storage solution for many computers. It also provides an economic way to provide an iSCSI “SAN” for a Failover Cluster, such as Hyper-V.

This document will detail how to build a 2 node Hyper-V cluster, using the Microsoft iSCSI Software Target for shared storage, which is managed by System Center running on virtual machines, hosted on another Hyper-V server and stored on the same shared storage.”

There is a possibility to get your company advertised in the document.  Contact me an we can work out terms.

EDIT #1:

Hans Vredevoort contacted some of the storage folks in Microsoft to discuss the MPIO/cluster member initiators issue.  It turns out that the Microsoft page in question was incorrect.  It used to be true, but the v3.3 Software Target does support iSCSI initiators that are members of a cluster.  The document has been updated with this note, but I have not added configuration steps for MPIO.