Designing Systems for Continuous Availability – Multi-Node with Remote File Storage

The speakers are Jim Pinkerton and Claus Jorgensen

Topic is on using SMB for remote storage of application files. Servers access their files on UNC file paths. Example: VM VHDs, SQL Server database and log files. Easier to provision and manage shares than LUNs. More flexible with dynamic serer relocation. No need for specialised hardware/netwok knowledge or infrastructure. LOWER cost.

Basic idea of architecture: some shared stord (e.g. Storage Spaces), file server cluster with shares, Hyper-V cluster hosts, SQL, or other servers store files on those shares.

Transparent Failover
In W2008 R2 a failover is not transparent. There is brief downtime to take down, move over, bring up the clustered service or role. 99% uptime at best

Failover in W8 is transparent to the server application. Supported planned and unplanned failovers, e.g. maintenance, failures, and load balancing. Requires Windows Failover Cluste, and both server and client must be running Windows Server 8. All operations, not just IO, must be continuous and transparent – transparent for file and directory operations.

This means we can have an application cluster that places data on a back end file server cluster. Both can scale independently.

Changes to Windows Server 8 to make transparent failover possible:
– New protocol: SMB 2.2
– SMB 2.2 Client (redirector): client operation replay, end-to-end for replay of idempotent and non-idempotent operations
– SMB 2.2 Server: support for network stte persistence, singles share spans multiple nodes (active/active shares – wonder if this is made possible by CSV?), files are always opened write-through.
– Resume Key – used to failover to: resume handle state after planned or unplanned failover, fence handle state information, mask some NTFS issues. This fences file locks.
– Witness protocol: enables faster unplanned failover because clients do not wait for timeouts, enables dynamic reallocation of load (nice!). Witness tells the client that a node is offline and tells it to redirect.

SMB2 Transparent Failover Semantics:
Server side: state persistence until the client reconnects. Example: delete a file. The file is opened, a flag is set to delete on close, and you close the file -> it’s deleted. Now you try to delete the file on a clustered file share. A planned failover happens. The node closes the file and it deletes. But after reconnect the client tries to close the file to delete it but its gone. This sort of circumstance is handled.

In Hyper-V world, we have “surprise failover” where a faulty VM can be failed over. The files are locked on file share by original node with the fence. A new API takes care of this.

SMB2 Scale Out
In W2008 R2 we have active-pasive clustered file shares. That means a share is only ever active on 1 node, so its not scalable. Windows Server 8 has scale out via active-active shares. The share can be active on all nodes. Targeted for server/server applications like SQL Server and Hyper-V. Not aimed at client/server applications like Office. We also get fewer IP addresses and DNS names. We only need one logical file server with a single file system namespace (no drive letter limitations), and no cluster disk resources to manage.

We now have a new file server type called File Server For Scale-Out Application Data. That’s the active/active type. Does not support NFS and certain role sevices such as FSRM or DFS Replication. The File Server for General Use is the active/passive one for client/server, but it also supports transparent failover.

VSS for WIndows Server 8 File Shares
Application consistent shadow copyof server application data that is stored on Windows Server 8 file shares. Bckup agent on the application server triggers backup. VSS on app server acts with File Share Shaow Copy Provider. It hits the File Share Shadow Copy Agent on the file server via RPC, and that then triggers the VSS on the file server to create the shadow copy. The backup server can read the snapshot directly from the file server, saving on needless data transfer.

Performance for Server Applications
SMB2.2 makes big changes. Gone from 25% to 97% of DAS performance. MSFT used same DAS storage in local and file share storage with SQL Server to get these numbers. NIC teaming, TCP offloads and RDMA improved performance.

Perfmon counters are added to help admins troubleshoot and tune. IO size, IO latency, IO queue length, etc. Can seperately tune SQL data file or log file.

Demo:
Scale-out file server in the demo. 4 clients accessing 2 files, balanced across 2 nodes in the scale out file server cluster. A node in the cluster is killed. The witness service sees this, knows which clients were using it, and tells them to reconnect – no timeouts, etc. The clients do come back online on the remaining node.

Platforms
– Networking: 2+ interfaces … 1 GbE, 10 GbE optionaly with RDMA, or Infiniband with RDMA
– Server: 2+ servers … “cluster in a box” (a self contained cluster appliance) or 2+ single node servers.
– Storage: Storage Spaces, Clustered PCI RAID (both on Shared JBOD SAS), FC/iSCSI/SAS fabric (on arrays)

Sample Configurations
– Lowest cost: cluster in a box with shared JBOD SAS using 1 GbE and SAS HBA. Or use the same with Cluster PCI RAID for better performance instead of the SAS HBA. An external port to add external storage to scale out. Beyong td that look at 10 GbE
– Discreet servers: 1/10 GbE with SAS HBA to Shared JBOD SAS. Or use advanced SANS.

Note: This new storage solution could radically shake up how we do HA for VMs or server applications in the small/mid enterprise. It’s going to be cheaper and more flexible. Even the corporations might look at this for low/mid tier services. MSFT did a lot of work on this and it shows IMO; I am impressed.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.