2012
11.05

Wh-at!?!?!? Isn’t Quick Migration dead?  Nope. Here’s why and here’s how to change this default behaviour:

Live Migration uses a LM network to copy/synchronize a running VM’s changing memory from a source host to a destination host. We have simultaneous LMs and we can do lots of those over 10 GbE, RoCE, or Infiniband to vacate a host for maintenance.

  • What if you have to get VMs off of a host really quickly?  You could find your bandwidth becomes a limit.
  • What if you have converged fabrics? Transferring VMs over that could hammer the network.

But you want to get the host offline as quickly as possible.  What to do?  We have the ability in WS2012 Hyper-V to prioritise VMs for failover ordering:

  • High: first
  • Medium: second
  • Low: last

Microsoft heard from some people that they wanted to treat low priority VMs differently when they pause a host (and only when they pause a host). They wanted to move the low priority VMs without the network/system interrupts of LM. They wanted to use Quick Migration.  So here is what happens (by default) if you pause a clustered host with high, medium, and low priority VMs):

  1. High priority VMs will LM first
  2. Medium priority VMs will LM second
  3. Low priority VMs will move using Quick Migration – and this happens in parallel with the others

Quick Migration will:

  • Put a VM into saved state. Make sure the Virtual Machines folder of the VM has sufficient space. It needs to match the currently allocated amount of RAM (Dynamic Memory changes this)
  • Transfers ownership of the VM files to the destination host
  • Starts the VM from the saved state
  • The VM is offline for X seconds, depending on storage bandwidth, storage speed, and the amount of RAM assigned to the VM

The load of Quick Migration is placed on the storage rather than on the LM network. But it implies that Low priority means that the VM has a lesser SLA, and it can be brought offline to move it when you put a host in maintenance mode (such as Cluster Aware Updating, or unplanned h/w maintenance).

I personally think this is not how most people will understand Low priority. Most of us consider Low Priority as an ordered mechanism in Failover Cluster VM failover, not as an SLA bracket.  I’ll be quite honest; I did not know about this default behaviour until Carsten Rachfahl (MVP) talked about it in a cluster presentation at TEC2012. We fired up my lab at work and proved it.  I asked MSFT if you could change this default behaviour.  We can.

Run this PowerShell to see why Quick Migration is used:

PS C:Usersadministrator.DEMO> Get-ClusterResourceType “Virtual Machine” | Get-ClusterParameter | fl *

….

ClusterObject : Virtual Machine
Name          : MoveTypeThreshold
IsReadOnly    : False
ParameterType : UInt32

Value         : 2000

Priority is a numeric value that that is assigned to VMs.  Low = 2000.  Changing the value of this VM attribute is not supported.  Note the highlighted results above.  The threshold for using QuickMigration is 2000.  OK!  While we cannot (for support reasons) customize the priority numeric value of the VM, we can change the threshold for doing Quick Migration:

Get-ClusterResourceType “Virtual Machine” | Set-ClusterParameter MoveTypeThreshold 1000

Now when you pause a host, a Low Priority VM (2000) will not cross the threshold for Quick Migration (1000) and it will be live migrated just like the high and medium priority VMs. Problem solved.

Note: If there is contention across the cluster after host failures, low priority VMs will be powered down to make room for higher priority VMs.

1 comment so far

Add Your Comment
  1. Thanks for the excellent post.

    In my case, having low priority VMs quick migrated is fine. However, I did notice that roles are “drained” not in order of priority. I have a number of VMs. Two are high priority, three medium and two low. My setup live migrates two VMs at a time. When I pause a node and drain the roles, I noticed one high priority VM and one medium priority VM were live migrated together, then the second high priority one, etc.

    Is there a reasoning behind this behavior? I would expect high priority to be migrated first. But then again, maybe migrating at the same time has other issues I am not considering (such as migration performance)?

    Thanks again for a great blog.

Get Adobe Flash player