KB2842111 – “Delayed Write Failed” Error When An I/O Stress Test Runs Against A WS2012 Failover Cluster

Microsoft has released a hotfix for Windows Server 2012 for when you perform an IO stress test from WS2012 or Windows 8 clients against a WS2012 cluster and you experience "Delayed Write Failed" errors.

Symptoms

Consider the following scenario:

  • You have a Windows Server 2012 failover cluster that is configured by using continuously available file shares.
  • An I/O stress test is running on a Windows 8 or Windows Server 2012-based client against the failover cluster. The stress test has a high ratio of open and close operations to data operations. For example, the test repeatedly opens a file on the file share, reads the file, and then closes the file.
    Note This scenario may be found in stress tests but does not map directly to customer-usage scenarios.

In this scenario, you may experience I/O errors during failover. Additionally, the following event may be logged in the System log:

Event ID: 50
Event Source: Mup
Description: {Delayed Write Failed} Windows was unable to save all the data from the file <file name>.The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elsewhere.

Cause

When a file on the file share is opened, a file handle is created. After the file is closed, the Server Message Block (SMB) redirector will cache the file handle for a short time. However, there is a limit on the number of handles that can be cached in this manner. During the stress test, the SMB scavenger can fall behind in closing the cached handles. This may result in a large backlog of handles. Eventually, the number of handles exceeds the limit that can be failed over within the continuous availability time-out and some I/O operations may fail. By default, the continuous availability time-out is 60 seconds.

A supported hotfix is available from Microsoft.