2009
11.30

Normally when you move 2 VM’s from one host to another using Live Migration they move one at a time.  Yes, the VMM job pauses at 50% for the second machine for a while – that’s because it hasn’t started to replicate memory yet.  The live migrations are serial, not concurrent.  The memory of a running VM is being copied across a network so the network becomes a bottleneck.

I ran a little test across 3 Windows Server 2008 R2 Hyper-V cluster nodes to see what would happen.  I started moving a VM from Host A to Host C.  I also started moving a VM from Host B to host C.  The first one ran straight through.  The second one paused at 50% until the first one was moved – just like moving 2 VM’s from one host to another.

2009
11.30

Excellently.  But why believe me?  I’ve just added a node to our cluster and moved a VM throughout the infrastructure in every combination I could think of, from host A to B, from B to A, from A to C, from C to A, from B to C … you get the idea.

While I was going this I was RDP’d into that VM that was being moved using Live Migration.  I ran a continuous ping from that session to the physical default gateway, a Cisco firewall.  This is the result of the ping:

    Packets: Sent = 1174, Received = 1174, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum = 4ms, Average = 0ms

What was lost?  Zip!  Nada!  How many interruptions did I experience during my RDP session?  Zip!  Nada!

‘Nuff said.

2009
11.30

I’ve just gone through this process so I thought I’d document what I did:

  • Have a test VM ready and running on the cluster.  You’ll be moving it around to/from the new node. Don’t use a production machine in case something doesn’t work.
  • Built the new node.  Set up hardware, drivers and patches, making sure the machine was identical to the other nodes in the cluster.  I mean identical.
  • Enable Hyper-V role and Failover Clustering feature.
  • Configure the virtual networks to be identical as the other nodes – VMM won’t do this in the “add” step and we know it messes up the configuration of External networks.
  • Used the SAN manager to present all cluster disks to the new node.
  • Put the cluster, Hyper-V cluster nodes and VMM server into maintenance mode in OpsMgr.
  • Add the new node to the cluster in Failover Clustering.  Modified the cluster quorum settings to be recommended.
  • Refreshed the cluster in VMM 2008 R2.  Waited for the new node to appear under the cluster in a pending state.
  • Right-clicked on the new pending node and selected Add Node To Cluster.  Entered administrator credentials (good for all nodes in the cluster).  VMM ran a job to deploy the VMM agent.
  • If everything is good and matches up (watch out for virtual networks) then you won’t see the dreaded “Unsupported Cluster Configuration” error.
  • Move that test VM around from the new node to all the other nodes and back again using Live Migration.
  • Re-run the validation tests against your cluster ASAP.

All should be well at this point.  If so, deploy your OpsMgr agent and take the OpsMgr agents out of maintenance mode.

2009
11.29

Let’s recap the different types of migration that we can get with Windows Server Hyper-V and System Center Virtual Machine Manager:

  • Quick Migration: Leveraging Windows Failover Clustering, a VM is treated as a clustered resource.  To quick migrate, the running state is saved to disk (hibernating the VM), the disk failed over to another node in the cluster, and the saved state is loaded (waking up the VM).
  • Offline Migration: This is when we use VMM to move a powered down VM from one un-clustered Hyper-V server to another or from one cluster to another.
  • Quick Storage Migration: This is a replacement for Offline Migration for Windows Server 2008 R2 Hyper-V servers when using VMM 2008 R2. A running VM can be moved from one un-clustered host to another or from one cluster to another with only around 2 minutes.
  • Live Migration: This is the process of moving a virtual machine from one cluster node to another with no perceivable downtime to network applications or users.  VMware refer to this as VMotion.  It was added in Windows Server 2008 R2 Hyper-V and is supported by VMM 2008 R2.

Live Migration was the big stick that everyone beat up Windows Server 2008 Hyper-V.  A few seconds downtime for a quick migration was often good enough for 75%-90% of VM’s but not for 100%.  But you can relax now; we have Live Migration.  I’m using it in production and it is good!  I can do host maintenance and enable completely automated PRO tips in VMM without worrying of any downtime, no matter how brief, for VM’s.  How does Live Migration Work?  Let’s look at how it works.

imageAbove, we have a virtual machine running on host 1.  It has a configuration and a “state”.

imageWhen we initiate a live migration the configuration of the VM is copied from host 1 when the VM is running to host 2, the destination host.  This builds up a new VM.  The VM is still running on host 1.

imageWhile the VM remains running on host 1, the memory of the VM is broken down and tracked using a bitmap.  Each page is initially marked as clean.  The pages are copied from the running VM on host 1 to the new VM sitting paused on host 2.  Users and network applications continue to use the VM on host 1.  If a RAM page changes in the running VM on host 1 after it has been copied to host 2 then Windows changes the state from clean to dirty.  This means that Windows needs to copy that page again during another copy cycle.  After the first RAM page copy cycle, only dirty pages are copied. As memory is copied again it is marked as clean.  As it changes again, it is marked as dirty.  This continues …

So when does all this stop?

  1. The process will cease if all pages have been copied over from host 1 to host 2 and are clean.
  2. The process cease if there is only a tiny, tiny amount of memory left to copy, i.e. the state. This is tiny.
  3. The process will cease if it has done 10 iterations of the memory copy. In this scenario the VM is totally trashing it’s RAM and it might never have a clean bitmap or tiny state remaining.  It really is a worst case scenario.

Note: The memory is being copied over a GB network.  I talked about this recently when I discussed the network requirements for Live Migration and Windows Server 2008 R2 Hyper-V clusters.

Remember, the VM is still running on host 1 right now.  No users or network applications have seen any impact on uptime.

imageStart your stop watch.  This next piece is very, very quick.  The VM is paused on host 1.  The remaining state is copied over to the VM on host 2 and the files/disk are failed over from host 1 to host 2.

imageThat stop watch is still ticking.  Once the state is copied from the VM on host 1 to host 2 Windows will un-pause it on host 2.  Stop your stop watch.  The VM is removed from host 1 and it’s running away on host 2 as it had been on host 1.

Just how long was the VM offline between being paused on host 1 and un-paused on host 2?  Microsoft claims the time is around 2 milliseconds on a correctly configured cluster.  No network application will time out and no user will notice.  I’ve done quite a bit of testing on this.  I’ve pinged, I’ve done file copies, I’ve used RDP sessions, I’ve run web servers, I’ve got OpsMgr agents running on them and not one of those applications has missed a beat.  It’s really impressive.

Now you should understand why there’s this "long" running progress bar when you initiate a live migration.  There’s a lot of leg work going on while the VM is running on the original host and then suddenly it’s running on the destination host.

VMware cluster admins might recognise the above technique described above.  I think it’s pretty much how they accomplish VMotion.

Are there any support issues?  The two applications that come to mind for me are the two most memory intensive ones.  Microsoft has a support statement to say that SQL 2005 and SQL 2008 are supported on Live Migration clusters.  But what about Exchange?  I’ve asked and I’ve searched but I do not have a definitive answer on that one.  I’ll update this post if I find out anything either way.

Edit #1

Exchange MVP’s Nathan Winters and Jetze Mellema both came back to me with a definitive answer for Exchange.  Jetze had a link (check under hardware virtualization).  The basic rule is that a DAG (Data Availability Group) does not support hardware virtualisation if the hosts are clustered, i.e. migration of an Exchange 2010 DAG member is not supported.

2009
11.29

Here’s the demonstration setup I’ll be using for the deployment session I’m presenting on Friday.  I’ll be talking about Windows 7 and Windows Server 2008 R2 deployment.  The technologies covered are WAIK, WDS and MDT 2010.

The demo machine is a Dell Latitude 6500.  It normally boots Windows 7 but I have attached an eSATA 7.2K 250GB hard drive.  That gives me decent speed on external storage; it’s also storage you can install Windows on to.  I boot the laptop up from that drive.  On there is Windows Server 2008 R2 with Hyper-V enabled.

On the parent partition is VMM 2008 R2 which I use to deploy new machines from templates stored in the library.  I’ve also installed Office 2007 so I can run PowerPoint and Office LiveMeeting 2007 so I can run the webcast.  I run LiveMeeting with the entire desktop shared and use a Polycom room microphone to pick up sound.  If I’m at a podium then I like to get up and walk a little bit.  I’ll also be using my laser pointer/clicker; it’s a decent sized thing – I don’t like little fiddly clickers.

There’s 5 demo VM’s configured.  I have a domain controller running W2008 R2 with AD, DNS and DHCP enabled and configured.  There is a deployment server running W2008 R2 with WDS enabled configured.  I’ve also installed WAIK and MDT 2010, both partially configured.  Some of the demos take too long for the session so I have some stuff pre-done.  There’s an XP SP3 VM, a blank VM and a Windows 7 VM.  The blank VM will be used to show the 3 types of deployment that I’ll be demonstrating, maybe even 4 given the time.  The Windows 7 VM is there in case I have time to demonstrate capturing an image.

All VM’s have a snapshot of their demo ready state.  I’ve defragged the disk to make the most of its speed.  When I run the session I’ll be sharing the entire desktop and expanding each VM to full screen (it appears like an RDP session).  This is because I’ll be plugged into a projector with a 1024*768 resolution and I need to be aware that viewers of the webcast will not be able to deal with huge resolutions.  I’m not RDP’ing into VM’s because a lot of the time I’m working with machines when there is no RDP available, e.g. BIOS, setup, etc.

And here’s a little something for Technorati: ZYRDJGJYCDG8

2009
11.29

Microsoft Ireland has posted the video of the Dublin community launch of Windows 7, Windows Server 2008 R2 and Exchange 2010.  I was lucky enough to be a part of the presentations, talking about the Microsoft Assessment and Planning Toolkit for Windows 7, the Application Compatibility Toolkit and Microsoft Deployment Toolkit 2010.  This was a demo intensive session and well worth checking out if you couldn’t make it on the day.  I’m in the “Windows 7 & Windows Server 2008 R2 Story Part I” video.

2009
11.29

I’ve been doing the last bits of preparing for my Windows User Group session on deploying Windows 7 and Windows Server 2008 R2 (details here and LiveMeeting webcast here) for this Friday (December 4th, 09:30 GMT – it’ll be recorded). 

I’ve been trying out a few of the features of Windows Server 2008 R2 Windows Deployment Services (WDS), a free OS image capture/deployment solution from Microsoft.  Some of the new features are:

  • Driver additions to the boot image are really easy.
  • Setting up multicast is really easy too.
  • Clients can join a multicast midway and then get the rest of the stream afterwards.
  • You can configure a multicast to only initiate when a session has enough computers or at a certain date/time.
  • You can allow no computers or all computers access to WDS. 
  • You can allow new computers to access WDS two ways.  The first (old) one is to pre-build computer accounts in Active Directory with the GUID/MAC of the physical machine to build.  Or, you can delay a boot up until the end user calls the helpdesk and gets and administrator to approve their session in the WDS console.
2009
11.29

I had to disable permalinks today in WordPress.  I found that scans of my site were failing because lots of URL’s could not be resolved.  This was because Permalinks was miscalculating what do with with punctuation in a title.  It’s a pity.  It also means every URL on the blog had to change which is a royal pain in the backside.  Sorry if you’d linked but there was no alternative.  It appears to be a common issue.

2009
11.28

As more and more people start deploying Windows Server 2008 R2 Hyper-V, the most common question will be: “how many NIC’s or network cards do I need to implement Live Migration?”.  Here’s the answer for you.

Your minimum optimal configuration is:

  • NIC #1: Parent partition (normal network)
  • NIC #2: Cluster heartbeat (private network)
  • NIC #3: Live Migration (private network)
  • NIC #4: Virtual Switch (normal/trunked network)

You’ll need to add more NIC’s if you want NIC teaming or need to dedicate NIC’s to virtual switches or VM’s.  This does not account for iSCSI NIC’s which should obviously be dedicated to their role.

How does Windows know which NIC to use for Live Migration?  Failover Clustering picks a private network for the job.  You can see the results by launching the Failover Clustering MMC, opening up the properties of a VM, and going to the last tab.  Here you’ll see which network was chosen.  You can specify an alternative if you wish.

I’ve gone with a different layout.  We’re using HP Blade servers with virtual connects.  Adding NIC’s is a pricey operation because it means buying more pricey virtual connects.  I also need fault tolerance for the virtual machines so a balance had to be found.  Here’s the layout we have:

  • NIC #1: Parent partition (normal network)
  • NIC #2: Cluster heartbeat / Live Migration (private network)
  • NIC #3: Virtual Switch (trunked network)
  • NIC #4: Virtual Switch (trunked network)

I’ve tested this quite a bit and pairing live migration with the cluster heartbeat has had no ill effects.  But what happens if I need to live migrate all the VM’s on a host?  Won’t that flood the heartbeat network and cause failovers all over the place?

No.  Live Migration is serial.  That means only one VM is transferred at once.  It’s designed not to flood a network.  Say you initiate maintenance mode in VMM on a cluster node.  Each VM is moved one at a time across the Live Migration network.

You can also see I’ve trunked the virtual switch NIC’s.  That allows us to place VM’s onto different VLAN’s or subnets, each being firewalled from each other.  This barrier is controlled entirely by the firewalls.  I’ll blog about this later because it’s one that deserves some time and concentration.  It has totally wrecked the minds of very senior Cisco admins I’ve worked with in the past when doing Hyper-V and VMware deployments – eventually I just told them to treat virtualisation as a black box and to trust me :)

I just thought of another question.  “What if I had a configuration that was OK for Windows Server 2008 Hyper-V Quick Migration?”.  That’s exactly what I had and why I chose the last configuration.  Really, you could do that with 3 NIC’s instead of 4 (drop the last one for no virtual switch fault tolerance).

Recommended Reading:

2009
11.27

One of the features not being talked about too much in Windows Server 2008 R2 Hyper-V is the ability to add new storage.  What does this mean?  It means you can add new virtual hard disks (VHD’s) to a VM while it is running.   It does not mean you can resize a VHD while the VM is running.

Before we go forward, we need to cover some theory.  There are two types of controller in Hyper-V:

  • IDE: The VM must boot from an IDE controller.  You can have 2 virtual IDE controllers per VM and a total of 4 IDE devices attached per VM.
  • SCSI: You cannot boot from a SCSI controller.  You can have up to 4 SCSI controllers, each with 64 attached VHD’s for a total of 256 SCSI VHD’s per VM.

Now don’t panic!  Forget the VMware marketing often done by uninformed shills.  When you install your enlightenments or integration components (IC’s) you’ll get the same performance out of IDE as you will with SCSI.  The only time when SCSI is faster than IDE in Hyper-V is if you don’t or can’t install the enlightenments or IC’s.  That’s because IDE requires more context switches in that scenario.

I normally use a single IDE disk for the operating system and programs.  I then use at least 1 SCSI disk for data.  And here’s why.

image

With Windows Server 2008 R2 Hyper-V you can add additional SCSI VHD’s to a VM while it’s still running.  You can see the VM configuration above (from VMM 2008 R2).  Adding another disk is easy.  You can see on the top bar that the option to add all types of hardware is greyed out – except for disk.

imageI’ve clicked on disk to reveal the above panel on the right hand side.  I can configure the disk, e.g. select the next available channel, choose a disk type (use existing, pass through, dynamic or fixed), set the size and name the VHD file.  Once I click on OK the disk is created and then made available to the VM.

From then on in, all you have to do in the OS is what you normally would do if you added a hot-add disk.

2009
11.27

I’m seeing the real world results of this.  We’re getting a little bit more out of the gigabytes of RAM that is in each of our hosts with Windows Server 2008 R2 Hyper-V instead of its predecessor, even with our hardware which does not have the very latest processor.

One of the main players in saving RAM on Hyper-V hosts with newer hardware will be SLAT or Second Level Address Translation.

image In Windows Server 2008 Hyper-V the parent partition (host operating system) is responsible for mapping the physical memory in the host with the memory that is running in the virtual machine.  Windows Server 2008 R2 Hyper-V removes that middle layer of management by offloading the responsibility to dedicated functions in the CPU.

CPU’s that have Intel’s Extended Page Tables (EPT) or AMD’s Nested Page Tables (NPT) or (Rapid Virtualization Indexing (RVI) have the ability to by delegated this responsibility.  This gets rid of the “shadow table” that the parent partition otherwise had to use … which was also consuming RAM.

It’s estimated that you will save 1MB of RAM per VM (there is an overhead of RAM for every VM and GB RAM in that VM) and there is also a small saving in CPU time required.

2009
11.27

Unlike VMware’s VMFS, we can extend a Cluster Shared Volume (CSV) without doing trickery that compromises performance.  And has has been documented by Hans Vredevoort (clustering MVP) it is very scalable.

How do you resize or expand a CSV?  It’s a pretty simple process:

  1. Use your storage management solution (I’m using HP EVA Command View for the EVA SAN) to expand the size of the LUN or disk.
  2. Use the Failover Clustering MMC to identify who is the CSV coordinator, i.e. the owner of the disk.
  3. Log into the CSV coordinator.
  4. Use either Computer Management->Storage Management or DISKPART.
  5. Remember to rescan the disks.
  6. Extend the volume to use all available space.

The steps for using DISKPART are:

  • rescan
  • list volume get the ID number for the CSV from here
  • select volume <ID Number> using the ID number from the previous step>
  • extend
  • list volume to see the results

It’s a painless operation and has no impact on running VM’s.

2009
11.27

Ease of administration.

To a sys admin, those 3 words mean a lot.  To a decision maker like a CIO or a CFO (often one and the same) they mean nothing.

It’s rare enough that I find myself working with physical boxes these days.  Most everyone is looking for a virtualised service which is cool with me.  Over the last 2 weeks I’ve been doing some physical server builds with Windows Server 2008 R2.  I know the techniques for a automated installation.  I just haven’t had time to deploy them for the few builds I needed to do.  Things like Offline Servicing for VM’s and MDT/WDS (upgrade) are in my plans but things had to be prioritised.  I’ve just kicked off a reboot of a blade server.  By the time that’s finished it’s POST I’ll have made and half drunk a cup of coffee.  After working with VM’s almost exclusively for the last 18 months, working with a physical box seems slow.  These are fine machines but the setup time required seems slow.  Those reboots take forever!  VM reboots: well there’s no POST and they reboot extremely quickly.

Let’s compare the process of deploying a VM and a physical box

Deploy a VM

  • Deploy a VM.
  • Log in and tweak.
  • Handover the VM.

Notes on this:

  • The free Offline Servicing Tool can allow you to deploy VM’s that already have all the security updates.
  • This process can be done by a delegate “end user” using the VMM self servicing web interface.
  • The process was probably just an hour or two from end to end.

Deploy a Physical Server

  • Create a purchase request for a new server.
  • Wait 1-7 days for a PO number.
  • Order the server.
  • Wait for up to 7 days for the server to be delivered.
  • Rack, power and network the server.
  • We’ll assume you have all your ducks in a row here: Use MDT 2010 or ConfigMgr to deploy an operating system.
  • The OS installs and the task sequence deploys updates (reboots), then applications (reboots), then more updates (reboots) and then makes tweaks (more updates and a reboot).
  • You have over the server.

Notes on this:

  • Most people don’t automate a server build.  Manual installs typically take 1 to 1.5 days.
  • There will probably be up to 1 day of a delay for networking.
  • The “end user” can’t do self service and must wait for IT, often getting frustrated.
  • The entire process will probably take 10.5 to 16.5 days.

Total Hardware Breakdown

Let’s assume the VM scenario used a cluster.  If the hardware failure crashed the host then the VM stops running.  The cluster moves the VM resource to another host (VMM will choose the most suitable one) and the VM starts up again.  Every VM on the cluster has hardware fault tolerance.  If the hardware failure was non-critical then you can use Live Migration to move all the VM’s to another host (VMM 2008 R2 maintenance mode) and then power down the host to work on it.  There’s no manual intervention at all in keeping things running.

What if you used standalone (un-clustered) hosts.  As long as you have an identical server chassis available you can swap the disks and network cables to get back up and running in a matter of minutes.

Unbelievably worst case scenario with un-clustered hosts: you can take the data disks and slap them into another machine and do some manual work to get running again.  As long as the processor is from the same manufacturer you’re good to go in a few hours.

If a physical box dies then you can do something similar to that.  However, physical boxes tend to vary quite a lot.  A farm of virtualisation hosts don’t usually vary too much at all.  If a DL380 dies then you can expect to put the disks into a DL160 and have a good result.  It might work. 

Most companies don’t purchase the “within 4 hours” response contracts.  And even if they do, some manufacturers will do their very best to avoid sending anyone out by asking for one diagnostic test after another and endless collections of logs.  It could be 1 to 3 days (and some angry phone calls) before an engineer comes out to fix the server.  In that time the hosted application has been offline, negatively affecting the business and potentially your customers.  If only a physical server was a portable container like a VM – see boot from VHD.

Summary

You’ve heard all those sales lines on virtualisation: carbon footprint, reduced rack space, lower power bills, etc.  Now you can see how easier administration can make your life easier but positively impact the business.

My experience has been that when you translate techie-speak into Euros, Dollars, Pounds, Rubles, Yen or Yuan then that get’s the budget owners attention.  The CFO will sit up and listen and probably decide in your favour.  And if you can explain how these technologies will have real positive impacts on the business then the other decision makers will also have your attention.

2009
11.26

Last night we finished migrating the last of the virtual machines from our Windows Server 2008 Hyper-V cluster to the new Windows Server 2008 R2 Hyper-V cluster.  As before, all the work was done using System Center Virtual Machine Manager (VMM) 2008 R2.  The remaining host has been rebuilt and is half way to being a new member of the R2 Hyper-V cluster.

I also learned something new today.  There’s no supported way to remove a cluster from OpsMgr 2007.  Yuk!

2009
11.26

Happy Thanksgiving!

Happy Turducken day to our American friends.  You’re probably not reading this until at least next Monday but you’ll at least know I was with you in spirit.  I’m taking the day off as a Niners fan and cheering on the Lions and whomever is playing against the Cowboys.

Of course, being a Niners fan I am not doing the Turkey thing.  I’ve got some Lasagne and a fine bottle of wine :)

2009
11.26

I posted earlier today about my network transfer tests on HP ProLiant BL460C G5 blade servers with Windows Server 2008 R2 Hyper-V.  Hans Vredevoort also did some tests, this time using BL460C G6 blades.  This gave Hans the hardware to take advantage of some of the new technologies from Microsoft.  Check out his results.

2009
11.26

Windows Server 2008 R2 includes some enhancements to optimise how networking works in Hyper-V.  I’m going to have a look at some of these now.

Virtual Machine Queue

Here’s the way things worked in Windows Server 2008.  The NIC (bottom left) runs at the hardware level.  VM1 has a virtual NIC. 

image When it communicates memory is copied to/from that NIC by the parent partition.  All routing and filtering and data copying is done by the parent partition in Windows Server 2008. 

Windows Server 2008 R2 takes advantage of Microsoft partnering with hardware manufacturers.

imageHow it works now is that the NIC, i.e. the hardware, handles the workload on behalf of the parent partition.  Hardware performs more efficiently than software.  All that routing, filtering and data copy is handled by the network card in the physical host.  This does rely on hardware that’s capable of doing this.

The results:

  • Performance is better overall.  The CPU of the host is less involved and more available.  Data transfer is more efficient.
  • Live Migration can work with full TCP offload.
  • Anyone using 10GB/E will notice huge improvements.

Jumbo Frames

TCP is pretty chatty.  Data is broken up and converted into packets that must be acknowledged by the recipient.  There’s an over head to this with the data being encapsulated with flow, control and routing information.  It would be more efficient if we could send fewer packets that contained more data, therefore with less encapsulation data being sent.

image

Jumbo Packets accomplish this.  Microsoft claims that you can get packets that contain 6 times more information with this turned on.  It will speed up large file transfers as well as reduce CPU utilisation.

Chimney Offload

This one has been around for a while with Windows but is support for Hyper-V was added with Windows Server 2008 R2.

imageIt’s similar to VMQ, requiring hardware support, and does a similar job.  The NIC is more involved in doing the work.  Instead of offloading from the parent partition, it’s offloading from the Virtual Machine’s virtual NIC.  The virtual NIC in the VM advertises connection offload capabilities.  The virtual switch in the parent partition offloads child partition TCP connections to the NIC.

Hardware Reliance

You need support from the hardware for these features.  During the RC release, the following NIC’s were included by MS in the media:

VM-Chimney Capable Drivers:

  • Broadcom Net-Xtreme II 1 Gb/s NICs (Models 5706, 5708, and 5709)
  • Broadcom 10Gb/s NICs (Models 57710, 57711)

VMQ Capable Drivers:

  • Intel Kawela (E1Q) 1 Gb/s NICs (also known as Pro/1000 ET NICs)
  • Intel Oplin NICs (IXE) 10Gb/s NICs (also known as 82598)
2009
11.26

Hans Vredevoort asked what sort of network speed comparisons I was getting with Windows Server 2008 R2 Hyper-V.  With W2008 R2 Hyper-V you get new features like Jumbo Frames and VMQ (Virtual Machine Queue) but these are reliant on hardware support.  Hans is running HP G6 ProLiant servers so he has that support.  Our current hardware are HP G5 ProLiant servers.  I decided this was worth a test.

I set up a test on our production systems.  It’s not a perfect test lab because there are VM’s doing their normal workload and thing like continuous backup agents running.  This means other factors that are beyond my control have played their part in the test.

The hardware was a pair of HP BL460C “G5” blades in a C7000 enclosure with Ethernet Virtual Connects.  The operating system was Windows Server 2008 R2.  The 2 virtual machines were also running Windows Server 2008 R2.  I set them up with just 512MB RAM and a single virtual CPU.  Both VM’s had 1 virtual NIC, both in the same VLAN.  They had dynamic VHD’s. The test task would be to copy the W2008 R2 ISO file from one machine to the other.  The file is 2.79 GB (2,996,488 bytes) in size.

There were three tests.  In each one I would copy the file 3 times to get an average time required.

Scenario 1: Virtual to Virtual on the Same Host

I copied the ISO from VM1 to VM2 while both VM’s were running on host one.  After I ran this test I realised something.  The first iteration took slightly longer than all other tests.  The reason was simple enough – the dynamic VHD probably had to expand a bit.  I took this into account and reran the test.

With this test the data stream would never reach the physical Ethernet.  All data would stay within the physical host.  Traffic would route via the NIC in VM1 to the virtual switch via its VMBus and then back to the NIC in VM2 via its VMBus.

The times (seconds) taken were 51, 55 and 50 with an average of 52 seconds.

Scenario 2: Virtual to Virtual on Different Hosts

I used live migration to move VM2 to a second physical host in the cluster.  This means that data from VM1 would leave the virtual NIC in VM1, traverse VMBus and the Virtual Switch and physical NIC in host 1, the Ethernet (HP C7000 backplane/Virtual Connects) and then the physical NIC and virtual switch in physical host 2 to reach the virtual NIC of VM2 via its VMBus. 

I repeated the tests.  The times (seconds) taken were 52, 54 and 66 with an average of 57.333 seconds.  We appear to have added 5.333 seconds to the operation by introducing physical hardware transitions.

Scenario 3: Virtual to Virtual During Live Migration

With this test we would start with the scenario in the first set of tests.  We would introduce Live Migration to move VM2 from physical host 1 to physical host 2 during the copy.  This is why I used on 512MB RAM in the VMs; I wanted to be sure the live migration end-to-end task would complete during the file copy.  The resulting scenario would have VM2 on physical host 2, matching the second test scenario.  I want to see what impact Live Migration would have on getting from scenario 1 to scenario 2.

The times (seconds) taken were 59, 59 and 61 with an average of 59.666 seconds.  This is 7 seconds slower than scenario 1 and 2.333 seconds slower than scenario 2.

Note that Live Migration is routed via a different physical NIC than the virtual switch.

Scenario 4: Physical to Physical

This time I would copy the ISO file from one parent partition to another, i.e. from host 1 to host 2 via the parent partition NIC.  This removes the virtual NIC, virtual switch and the VMBus from the equation.

The times (seconds) taken were 34, 28 and 27 with an average of 29.666 seconds.  This makes the test scenario physical data transfer 22.334 seconds faster than the fastest of the virtual scenarios (scenario 1).

Comparison

Scenario Average Time Required (seconds)
Virtual to Virtual on Same Host

52

Virtual to Virtual on Different Hosts

57.333

Virtual to Virtual During Live Migration

59.666

Physical to Physical

29.666

Waiver

As I mentioned, these tests were not done in lab conditions.  The parent partition NIC’s had no traffic to deal with other than an OpsMgr agent.  The Virtual Switch NIC’s had to deal with application, continuous backup, AV and OpsMgr agent traffic.

It should also be noted that this should not be a comment on the new features Windows Server 2008 R2 Hyper-V.  Using HP G5 hardware I cannot avail of the new hardware offloading improvements such as VMQ and Jumbo Frames.  I guess I have to wait until our next host purchase to see some of that in play!

This is just a test of how things compare on the hardware that I have in a production situation.  I’m actually pretty happy with it and I’ll be happier when we can add some G6 hardware.

2009
11.25

Microsoft released lots of updates for Operations Manager over the last couple of weeks.  There are lots of updates to management packs, too many for me to go posting them at this time of night.  Have a look on the catalogue and you’ll see them.  Or check your console if you’re using OpsMgr 2007 R2.

Most importantly is KB971541, Update Rollup for Operations Manager 2007 Service Pack 1.

“The Update Rollup for Operations Manager 2007 Service Pack 1 (SP1) combines previous hotfix releases for SP1 with additional fixes and support of SP1 roles on Windows 7 and Windows Server 2008 R2. This update also provides database role and SQL Server Reporting Services upgrade support from SQL Server 2005 to SQL Server 2008.

The Update Rollup includes updates for the following Operations Manager Roles:

  • Root Management Server, Management Server, Gateway Server
  • Operations Console
  • Operations Management Web Console Server
  • Agent
  • Audit Collection Server (ACS Server)
  • Reporting Server

The following tools and updates are provided within this update which may be specific to a scenario:

  • Support Tools folder – Contains SRSUpgradeTool.exe and SRSUpgradeHelper.msi (Enables upgrade of a SQL Server 2005 Reporting Server used by Operations Manager Reporting to SQL Server 2008 Reporting Server)
  • Gateway folder – Contains a MSI transform and script to update MOMGateway.MSI for successful installation on Windows Server 2008 R2
  • ManagementPacks folder – Contains an updated Microsoft.SystemCenter.DataWarehouse.mp which requires manual import

For a list of fixes and tools addressed by this update rollup, see KB971541.

This update is supported for application on System Center Operations Manager 2007 Service Pack 1 only.

Feature Summary

The System Center Operations Manager 2007 SP1 Rollup 1 contains:

  • All binary hotfixes released since Service Pack 1 release
  • Support for Windows 7 and Windows Server 2008 R2
  • Operational and DataWarehouse database support on Windows Server 2008 R2
  • Additional stability hotfixes”

Requirements

  • Supported Operating Systems: Windows 7; Windows Server 2003; Windows Server 2008; Windows Server 2008 R2; Windows Vista; Windows XP
  • System Center Operations Manager 2007 Service Pack 1

Instructions

This update must be applied to each computer that meets the following criteria:

  • Hosts a Microsoft Operations Manager Root Management Server
  • Hosts a Microsoft Operations Manager Management Server
  • Hosts a Microsoft Operations Manager Operations Console
  • Hosts a Microsoft Operations Manager Web Console Server
  • Hosts a Microsoft Operations Manager Reporting Server
  • Hosts a Microsoft Operations Manager Manually installed Agent
  • Hosts a Microsoft Operations Manager ACS Server

Before applying this update it is strongly recommended that Operations Manager databases, Management Server, Report Server and Web Console roles be backed up.

To extract the files contained in this update and installation of the update on the Operations Manager roles above:

  1. Copy the file – SystemCenterOperationsManager2007-SP1-KB971541-X86-X64-IA64-locale.MSI – To either a local folder or accessible network shared folder.
  2. Run the file – SystemCenterOperationsManager2007-SP1-KB971541-X86-X64-IA64-locale.MSI – locally on each applicable computer that meets the predefined criteria.
    You can run SystemCenterOperationsManager2007-SP1-KB971541-X86-X64-IA64-locale.MSI from either Windows Explorer or from a command prompt.
  3. Select the appropriate role to update from the Operations Manager 2007 Software Update dialog.

NOTE: To run this file on Windows Server 2008 you must run this file from a command prompt which was executed with the Run as Administrator option. Failure to execute this Windows installer file under an elevated command prompt will not allow display of the System Center Operations Manager 2007 Software Update dialog to allow installation of the hotfix”.

2009
11.25

This guide explains the process for upgrading Active Directory domains to Windows Server 2008 and Windows Server 2008 R2, how to upgrade the operating system of domain controllers, and how to add domain controllers that run Windows Server 2008 or Windows Server 2008 R2 to an existing domain.

Upgrading your network operating system requires minimal network configuration and typically has a low impact on user operations. The upgrade process is straightforward, efficient, and allows your organization to take advantage of the improved security that is offered by the Windows Server 2008 and Windows Server 2008 R2 operating systems. This guide covers the process for upgrading domains and domain controllers, and how to add new domain controllers to existing Active Directory domains. It includes details about how to run Adprep.exe and resolve known issues and errors if they arise.

2009
11.25

One of the most common queries I used to get on my old blog was “how do I convert Hyper-V disks?”.  Converting a VHD is easy enough.  In the Hyper-V console you shut down the VM, edit the disk, select a location for the new VHD.  Once that’s done you can rename the old disk and grant it’s old name to the new disk.  Start up the VM and it’s using the new disk.  Remember to remove the old disk to save disk space.

Before you even think about this you need to be sure you have enough space for the new disk of the desired type to be created.  How much space will it need?  Check how much data is on that disk (in the VM’s OS) and allow for another GB or two to be safe.  This applies to both the Hyper-V console and VMM.

VMM is a bit more elegant.  You shut down your VM, hopefully from the OS itself or via the IC’s, rather than just turn it off.  Then you edit the properties of the VM and navigate to the disk in question.

clip_image001

Here I have opened up the VM that I wanted to work on and I’ve navigated to the disk in question.  It’s a fixed VHD and I want to replace it with a dynamic VHD without losing my data.  You can see in the right-hand side that there is a tick box to convert the disk.  I ticked this and clicked on OK.

Notice that there is also a tick box for expand the VHD?  That’s how we can grant more space to a VM.  Remember to follow that up by running DISKPART in the VM to expand the volume.  That’s nothing to do with the convert task but I thought I’d mention it.

clip_image003

Once I clicked on OK the job runs.  How long this will take depends on the amount of data we’re dealing with.

VMM is pretty clever here.  It will convert the disk and then swap out the old disk with the new disk.  The old disk is removed.  This is a much less manual task than using the Hyper-V console.

clip_image004

Once the job is done you should check out your VM.  You can see above that the disk is now a dynamic disk.  And notice how much space I’ve saved?  I’ve gone from 20GB down to 12.5GB.  I’ve just saved my employer 40% of the cost to store that VM with a couple of mouse clicks while waiting for my dinner.  That goes to back up my recent blog post about simpler and more cost effective storage.  And like I said then, I’ve lost nothing in performance because I am running Windows Server 2008 R2.

2009
11.25

Cluster Shared Volume (CSV) is the shared storage system that can be used by multiple cluster members at once for storing and running virtual machines in a Hyper-V cluster.  CSV is specifically customised for Hyper-V and should not be used for anything else – you even have to agree to than to enable it.  That customisation means that things would change a bit.

VSS is the Volume Shadow Copy Service.  Using Hyper-V certified backup solutions like DPM you can backup the state of a virtual machine running in Hyper-V in a supported manner.  This is done at the host level.  That’s different to a file level backup that you would do with an agent installed in the VM.  That would be able to recover individual files.  The host level backup would be able to recover the entire VM back to the point of time that you did the backup.

There have been reports of issues.  For example, DPM 2007 R2 is not live migration aware.  You’ll have to wait until around April for DPM 2010 for a solution to that.  That only affects the snapshot backups.

A rollup package has been released by Microsoft for Windows Server 2008 R2 Hyper-V to resolve some issues with CSV and VSS.  Thanks to Hans Vredevoort (clustering MVP) for making me aware of this.  Article KB975354 fixes the following situations.

“This update rollup package resolves some issues that occur when you backup or restore Hyper-V virtual machines

Issue 1

Consider the following scenario:

  • Some Internet SCSI (iSCSI) connections are created in a virtual machine that is running Windows Server 2003.
  • You back up this virtual machine on the virtual machine host server.

In this scenario, the error code 0x800423f4 occurs when you back up the virtual machine. Additionally, the following event is logged into the Hyper-V Virtual Machine Management Service event log:

The number of reverted volumes does not match the number of volumes in the snapshot set for virtual machine "’virtual machine name’ (Virtual machine ID <GUID>)".

Issue 2

Consider the following scenario:

  • Cluster shared volumes are enabled on a failover cluster for Hyper-V.
  • Some virtual machines are saved on the same volume. But they are running on different nodes.
  • These virtual machines are backed up in parallel.

In this scenario, the virtual machine backup operation fails.

Issue 3

Consider the following scenario:

  • A virtual machine is being backed up on a server that is running Hyper-V.
  • At the same time, an application backup operation is being performed in the same virtual machine.

In this scenario, some data is truncated from the application backup in the virtual machine. Therefore, this behaviour causes data loss.

Issue 4

Consider the following scenario:

  • A virtual machine that has some snapshots is backed up on a server that is running Hyper-V.
  • Then, this virtual machine is restored to another location.

In this scenario, the restore operation fails and the virtual machine may be corrupted”.

If you’re running a Windows Server 2008 R2 Hyper-V cluster and are still getting used to it then there’s some good news.  Here’s how I’ll approach this:

  • Put the OpsMgr 2007 agent for host 1 into maintenance mode.
  • Put the host 1 in maintenance mode in VMM 2008 R2.  That kicks off live migration and moves VM’s from that host to another host.  You can do this manually in the failover cluster management console if you don’t have VMM 2008 R2.
  • Apply the update to host 1 and reboot.
  • Test host 1 with a test VM.
  • Repeat with all other hosts in the cluster.

That should work.  And it’ll probably be your first opportunity to use Live Migration and VMM 2008 R2 Maintenance Mode in anger.  Think about it, when do you normally get to do server patching during the work day?  Now you can!

2009
11.25

We know that VMware has a huge ecosystem of virtual appliances that run on their VMDK platform.  Microsoft has the VHD platform and thanks to the publication of integration components for Linux under GPLv2, we can expect something similar for Hyper-V.

Virtualboy posted about the launch of the Appliance Test Drive.  Microsoft has an existing library of trail products available.  Partners have joined in with trial and/or free products based on VHD:

Citrix EVA for XenApp
The Citrix EVA for XenApp enables customers and partners to evaluate both online and offline application virtualization with XenApp 5 for Windows Server 2008.

DataCore Virtual SAN Appliance
Free, Ready to Run, DataCore Virtual SAN Appliance. This Virtual SAN Appliance software places shared storage for virtual machines and physical servers at your fingertips.

Athena for System Center Configuration Manager 2007
Athena-enabled device management functions extend and complement the native capabilities of Configuration Manager 2007 for Windows Mobile and Windows CE Embedded devices.

EventTracker VHD
The VHD is a pre-configured, fully functional trial of EventTracker. This image comes with EventTracker including change monitoring and the EventTracker Event Portal, Event Log Central.

ThinPrint .print Server Engine 7.6
ThinPrint .print Server Engine 7.6 is installed on Windows 2003 R2 and allows for evaluation of V-Layer, universal printer driver, bandwidth control and compression.

Check it out if you’re looking to evaluate some technology.  If you’re an ISV then you should consider this as a way to provide pre-configured evaluations to customers.  Part of the complexity for potential customers getting a trial working is figuring out those configurations.  If you are an ISV then you can remove that pain from introducing your product to a new customer.

2009
11.25

Microsoft released a hotfix rollup package (KB976244) for System Center Virtual Machine Manager 2008 R2 (not VMM 2008).  It fixes three issues:

  1. If a virtual machine that was created from a template fails during the customization phase, the owner of the virtual machine cannot use the self-service portal to connect to the virtual machine.
  2. The Enable spoofing of MAC addresses option in the virtual machine properties is cleared when you configure the network adapter to connect to a different virtual network.
  3. The "Refresh host cluster" job can take more than 10 minutes to finish.
This update is available via Microsoft Update.  Make sure you approve this update if you’re using WSUS, ConfigMgr or something else to control your updates.

 

There is also another update (KB976246) available for VMM 2008 R2 (only).  It deals with this issue: “When you remove a virtual hard disk from a virtual machine in System Center Virtual Machine Manager 2008 R2, the .vhd file on the Hyper-V server is deleted without warning”.

Again, the update is available via Microsoft Update.  After you install the hotfix you will get this warning every time you attempt to delete a VHD in VMM 2008 R2:

“You have chosen to remove virtual hard disks from virtual machine VMName. This action will delete the .vhd files.

Do you want to continue?”

I’m approving the updates now on our WSUS server to deal with these issues.

2009
11.25

I’ve not been keeping up with my reading as of late.  I missed that this document from HP came out – I was distracted with actually deploying a Windows Server 2008 R2 Hyper-V cluster on HP ProLiant Blade Servers and HP EVA SAN storage instead of reading about it :)

This document appears to be essential reading for any engineer or consultant who is sizing, planning or deploying Windows Server 2008 R2 Hyper-V onto HP Blade servers and HP EVA, MSA or LeftHand storage.

It starts off with a sizing tool.  That’s probably the trickiest bit of the whole process.  Disk used to be easy because we normally would have used Fixed VHDs in production.  But now we can use Dynamic VHDs knowing that the performance is almost indistinguishable. The best process for disk sizing now is base it on data, not the traditional approach of how many disks do you need.  Allow some budget for purchasing more disk.  You can quickly expand a LUN, then the CSV and then the VHD/file system.  Next comes the memory.  Basically each GB of VM ram costs a few MB in overhead charges.  You need to also allow 2GB for the host or parent partition.  What that means is that a host with 32GB of RAM realistically has about 29GB available for VM’s. The HP tool is pretty cool because it will pull in information from Microsoft’s MAP.  The free Microsoft Assessment and Planning Toolkit for Hyper-V will scan your servers and identify potential virtualisation candidates.  This gives you a very structured approach to planning.

The document talks about the blade components and blade servers.  There’s 3 types of blade from HP.

  • Full height: These are expensive but powerful.  You can get 8 of them in an enclosure.  Their size means you can get more into them.
  • Half height: You can get 16 of these into an enclosure, the same kind used by the full heights.  16 is coincidentally the maximum number of nodes you can put in a Windows cluster.  These are the ones we use at work.  Using Mezzanine cards you can add enough HBA’s and NIC’s to build a best practice W2008 R2 Hyper-V cluster.
  • Quarter height or Shorties: These machines are smaller and thus can have less components.  Using some of the clever 10Gig Ethernet stuff you can oversubscribe their NIC’s to create virtual NIC’s for iSCSI and Virtual Switches.  I’d say these are OK for limited requirements deployments.  Their custom enclosure can be a nice all-in-one featuring storage and tape drives (note you can also do this with the other blades but you’ll never get the capacities to match the server numbers).

What is really cool is that HP then gives you reference architectures:

  • Small: A single C3000 enclosure with internalised storage.  MSA or JBOD (un-clustered hosts) storage is something I would also consider
  • Medium: A single C7000 enclosure with LeftHand storage.  I’d also consider MSA or EVA storage here.  LeftHand is incredibly flexible and scalable but it is expensive.
  • Large: I’m drooling while looking at this.  Multiple (you can get 4 in a 42U rack, with 64 half height blades) C7000 enclosures and 2 racks of EVA 8400 storage.  Oooh Mama!

There’s even a bill of materials for all this!  It’s a great starting point.  Every environment is going to be different so make sure you don’t just order from the menu.

It’s not too long of a document.  The only thing really missing is a setup guide.  But hey, that’s all the more reason to read my blog ;)

Get Adobe Flash player