KB2799728–VM Enters Paused State Or CSV Goes Offline When Backup WS2012 Hyper-V Cluster

Please note the below comments.  There are memory leak issues being reported with this hotfix.

Please pay special attention to this hotfix.  It’s the sort of one I expect to see on forums and be asked about for the next 18 months.  I recommend making this patch a standard part of your install of WS2012 Hyper-V clusters.

The scenario is when a virtual machine enters a paused state or a CSV volume goes offline when you try to create a backup of the virtual machine on a Windows Server 2012-based failover cluster.

Consider the following scenario:

  • You enable the Cluster Shared Volumes (CSV) feature on a Windows Server 2012-based failover cluster.
  • You create a virtual machine on a CSV volume on a cluster node.
  • You start the virtual machine.
  • You try to create a backup of the virtual machine on the CSV volume by using Microsoft System Center Data Protection Manager (DPM) or any backup software that uses the Microsoft Software Shadow Copy Provider.

In this scenario, one of the following issues occurs:

  • The backup is created, and the virtual machine enters a paused state.
  • The CSV volume goes offline. Therefore, the virtual machine goes offline, and the backup is not created.

Additionally, the following events are logged in the Cluster log and System log respectively:

Software snapshot creation on Cluster Shared Volume(s) (‘volume location‘) with snapshot set id ‘snapshot id‘ failed with error ‘HrError(0x80042308)(2147754760)’. Please check the state of the CSV resources and the system events of the resource owner nodes.

 

Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: Date and time
Event ID: 5120
Task Category: Cluster Shared Volume
Level: Error
Keywords:
User: SYSTEM
Computer: Computer name
Description: Cluster Shared Volume ‘Volume1’ (‘name’) is no longer available on this node because of ‘STATUS_IO_TIMEOUT(c00000b5)’. All I/O will temporarily be queued until a path to the volume is reestablished.

 

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          Date and time
Event ID:      5142
Task Category: Cluster Shared Volume
Level:         Error
Keywords:
User:          SYSTEM
Computer:      Computer name
Description: Cluster Shared Volume ‘Volume3’ (‘Cluster Disk 4’) is no longer accessible from this cluster node because of error ‘ERROR_TIMEOUT(1460)’. Please troubleshoot this node’s connectivity to the storage device and network connectivity.

The virtual machine enters a paused state because the Ntfs.sys driver incorrectly reports the available space on the CSV volume when the backup software tries to create a snapshot of the CSV volume. Additionally, the CSV volume goes offline because the CSV volume does not resume from a paused state after an I/O delay issue or an I/O error occurs.
Note The CSV volume is resilient.

A supported hotfix is available from Microsoft.

There is more:

After you install the hotfix, CSV volumes do not enter paused states as frequently. Additionally, a cluster’s ability to recover from expected paused states that occur when a CSV failover does not occur is improved.

To avoid CSV failovers, you may have to make additional changes to the computer after you install the hotfix. For example, you may be experiencing the issue described in this article because of the lack of hardware support for Offloaded Data Transfer (ODX). This causes delays when the operating system queries for the hardware support during I/O requests.

In this situation, disable ODX by changing the FilterSupportedFeaturesMode value for the storage device that does not support ODX to 1. For more information about how to disable ODX, go to the Microsoft website.

12 thoughts on “KB2799728–VM Enters Paused State Or CSV Goes Offline When Backup WS2012 Hyper-V Cluster”

  1. Wow I can’t believe Microsoft has admitted it has a fault with this. I was on the phone to both a senior engineer and the Technical Lead in the DPM department of Microsoft Support for three days without resolution back in October. I realise Server 2012 is new but I still expected Microsoft to provide a backup solution that worked. In the end I ended up having to buy Veeam which by all means is a lot better than DPM has ever been in my opinion. It worked out of the box installed all prerequisites and backed up my 3 node cluster in 2 hours rather than 12.

    DPM 2012 SP1 also created this other problem that after a backup it would leave all cluster traffic in redirected mode with all traffic going through the Cluster Shared Volume Owner even though this was not showed in the Cluster Manager. The only way to resolve this was to drain each node and restart each node. Since using Veeam we have none of the above issue and my Cluster is now stable.

    1. When you had redirected mode, what version of Hyper-V were you using? WS2012 Hyper-V does not use redirected mode for backup … and DPM 2012 without SP1 did not support WS2012.

  2. Hi Guys
    having the same issue here with DPMSP1 RollUp3 and 2012 HF FO cluster;
    did MS get this fixed yet? (without memory leaks..etc)
    many thnx
    Rgds
    Willem

  3. Did anyone receive further info on this? I’m having a similar issue with two brand new 2012 servers, on esxi 5.5. Thanks in advance for any information.

  4. I had exactly the same problem and had to solve it with Veaam. Still amazed on how Microsoft can publish a software like DPM with cluster backups untested.

  5. Hello Aidan,
    have you feedbacks about this issue on Windows 2012 R2 ?
    I’ve similar cases on our customers and backup software provider asks me to disable ODX on Hyper-V nodes.
    regards,

Leave a Reply to Aidan Finn Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.