2009
03.31

I’ve done a little speaking about this subject over the last while and after listening to a recent radio conversation I thought I’d post something too.  The story was about a revolutionary online gaming system, the idea being that instead of buying a DVD with the game, you’d play it online and it would stream to your PC or console.  One of the expert commentators finally made the point I was waiting to hear.  The service provider has their servers in the USA.  This means that the games players in North America would be close to the servers.  Until there was a European presence, games players here should probably steer clear because the interaction would be slow.

Why not just get a “faster” Internet connection?  That’s our usual answer to these problems.  Think about a business that has a head office with a SharePoint server and a branch office where users use that server over the WAN.  When enough users call into the helpdesk to complain about slow downloads, what do we do?  We usually go out and buy a bigger WAN link.  That is the wrong thing to do without considering what’s really going on.

Definitions

There’s two things to measure when it comes to a network link.

  • Bandwidth: This is how much data we can transfer at once, i.e. in one packet, from the source to the destination.  We measure this in KBPS, MBPS or even GBPS if you have lots of money.
  • Latency: This is how long a packet takes to travel from the source to the destination.  This is limited by the laws of physics.  An electron can only travel at 1 speed on a copper wire.  Increasing bandwidth has no affect on this.  A photon on a fibre link can only travel at the speed of light.  Even if we could increase this speed, Einstein tells us that this would negatively affect the latency!  Finally, devices that process the data at various hops, e.g. routers, switches and firewalls, also add to latency.  So, the further apart a client and server are, the longer a transmission takes.  Adding in more network devices, e.g. transmissions between different ISP’s, worsens this.

There is a conversation between the server and the client when any data or a file is transferred over the network.  The file is broken up into packets.  Headers and control flags wrap each of those packets up to increase the amount of data.  Then a conversation takes place.  At a very high level, here’s how it goes:

  • Server: Here’s packet 1
  • Client: Acknowledged
  • Server: Here’s packet 2
  • Client: Acknowledged
  • Server: Here’s packet 3
  • Client: Not Acknowledged
  • Server: Here’s packet 3 again
  • Client: Acknowledged
  • Server: Here’s packet 4
  • Client: Acknowledged
  • Server: Here’s packet 5
  • Client: Acknowledged
  • Server: Here’s packet 6
  • Client: Acknowledged

And so it goes until all of the packets that make up the entire file are transmitted.  Bandwidth affects the time for transmission by reducing how much data we can put into a packet.  Note that the TCP stack in the operating system can also limit this.  Bandwidth also causes problems when we try to put too many simultaneous conversations onto a pipe.  We can monitor bandwidth by measuring link utilisation.

Latency is best explained as follows.  If it takes 1 millisecond to transmit a packet between the client and server then the above file copy would take 14 milliseconds.  If we move the client to a remove location then latency goes up, perhaps to 100 milliseconds.  Now the file copy takes 100 times longer: 1400 milliseconds.  Realistically, a file transfer requires exponentially more packets.  An intercontinental latency measurement (use PING) might be 300 milliseconds or more!

Let’s go back to the above examples and see how latency and bandwidth affected them:

  • The European Online games player: Games playing requires timely interaction with the game.  The slower the game player’s responses, the worse they play.  If a European players responses are 30 times longer than those of an American, how can they have the same gaming experience?  The European is hit by bandwidth.
  • By throwing bandwidth at the SharePoint server, we allow many more users in the branch office to have the same slow experience.  Latency causes the packets that make up the file transmission to be slow.

Basic Networking Solutions

Networking wise, there’s a few solutions we can look at:

  • Place the servers closer to the clients.  For a “cloud computing” service provider, that’s possible by having the service closer to the consumers.  For a corporate, that might mean having servers in the branch office, something we want to steer clear of doing if at all possible to reduce costs and complexity.
  • Reduce the hops between the servers and clients: For a company, this means strategically subscribing to ISP services.  For an online service provider you can only do this so much, e.g. use a major service provider.  But there’s always going to be clients who many hops away.

We still have issues here.  So we want to get more data on the pipe and once and send fewer packets so that latency plays less of a role.

Advanced Networking Solutions

Both Riverbed (Steelhead) and Citrix (WanScaler) have appliances that can be placed in both the head office and the branch office.  A PC in the branch office will look to copy a file from the HQ server.  All the usual file locking and security stuff takes place (as it will throughout this process).  The server breaks the file up into packets and starts the transfer.  The appliance in the HQ sits silently between the server and the WAN connection.  It listens to the stream and uses a hashing algorithm to break down the data transmission into blocks which are stored on the appliance according to a set of predefined rules.

The data travels over the WAN to the branch office.  The branch office appliance also listens to the new data and does and identical breakdown and hashing algorithm ID of the blocks before caching them.  The data stream continues to the client.  At this point, no speed increase has taken place.

The process will go as follows if this client or any other in the same branch office goes to transfer this file again.  The second client does the usual file lock and security stuff.  The server believes it is talking to the client.  Instead it talks to the HQ appliance.  The appliance breaks down the blocks and ID’s them using the hashing algorithm.  Any previously cached packets don’t need to be transmitted.  Instead, the HQ appliance works with the branch office appliance.  The branch office appliance receives the block’s hash ID and then sends that packet to the client over the LAN.

The effect?  Previously transmitted blocks are not sent over the WAN.  This reduces bandwidth utilisation.  By removing the need to send data at all, we remove latency from the equation.  Other than some security and file locking procedures, a file transfer can be local only at the branch office, i.e. between the appliance and the client.

Because the system works by using blocks the optimisation can even work for files that haven’t even been requested over the WAN before, as long as they are made up of blocks similar to previously transmitted files.

The process I’ve talked about here has been simplified.

The appliances work at a TCP level.  This means that WAN optimisation can improve way more than just file copies, e.g. Exchange, Oracle, SQL, Lotus Notes, etc.  The basic requirements are that the data is not signed and not encrypted.  You also need to turn off SMB data signing in Group Policy.  That’s because the appliances are in-a-way performing a man-in-the-middle attack.

These appliances are very expensive so they are not widespread.  I’ve done some work with low spec devices from Riverbed back in 2006 and they really did work very well.

The Next Generation TCP Stack

Microsoft included a new TCP stack in Windows Vista and Windows Server 2008.  It is also in Windows 7 and Windows Server 2008 R2.  The Next Generation TCP Stack isn’t the complete WAN solution but it does improve things.

Compound TCP aims to reduce the effect of latency.  The server in the previous example of the file transfer will send many packets before waiting for an acknowledgement:

  • Server: Here’s packet 1
  • Server: Here’s packet 2
  • Server: Here’s packet 3
  • Server: Here’s packet 4
  • Server: Here’s packet 5
  • Server: Here’s packet 6
  • Client: Give me packet 3 again
  • Server: Here’s packet 3
  • Client: Acknowledged

As I said before, this is a tiny example of what is going on under the covers.  If our latency was 100 milliseconds before then the first example took 1400 milliseconds.  With compound TCP, the copy will take 900 milliseconds.

Vista and Windows Server 2008 also gave us an Auto-Scaling Receive Side Window.  The client and server work together to calculate how much bandwidth there is, i.e. how big a packet can be or how much data can be placed in the pipe at one time.  In legacy operating systems such as XP and Windows Server 2003, this is a static definition for both LAN and WAN transfers and usually shouldn’t be manually altered.  With this auto scaling receive side window, our file copy will increase the size of the data portion of the packets and may look like this:

  • Server: Here’s packet 1
  • Server: Here’s packet 2
  • Server: Here’s packet 3
  • Client: Give me packet 3 again
  • Server: Here’s packet 3
  • Client: Acknowledged

    We’re using Compound TCP as well, meaning we’re sending fewer packets and using as much bandwidth as possible by sending more at once.  Now our time to transfer the file on the 100 millisecond link is 600 milliseconds.  Remember this started out at 1400 milliseconds.

    The limits to the optimisation offered by Auto Scaling Receive Side Windows are (a) the ability for the application protocol to buffer data and (b) the bandwidth available.

    Microsoft came up with SMBv2 so that the file and print sharing protocol could handle this huge data streams that can now be transferred over large links. 

    The risk with this auto scaling receive side window is that one file copy over the WAN could shut consume the entire WAN link and effectively shut down business traffic like RDP, ICA, etc.  Using Group Policy (GPO), we can tag traffic between selected sources, selected destinations, certain protocols (TCP or UDP) or ports (80, 443, 3389, etc).  An example might be that all web traffic between 10.0.0.1/24 and 10.195.34.0/24 on TCP 80 should be tagged.  Network administrators can then use those tags to put QoS (Quality of Service) rules in place for traffic prioritisation, e.g. TCP 80 traffic with the Internet or a proxy server might be of a lower priority than HTTP traffic with a SharePoint Server and RDP traffic with a Terminal Server might be higher again.  This would prioritise critical business traffic over lesser valued traffic at the network level.

    This does improve things but data still has to go over the WAN and latency is still going to cause noticeable delays.

    Note that this huge improvement of data transmission really is best seen on dedicated local are networks between servers, e.g. application servers and data servers.

    Windows 7 and Window Server 2008 R2 – Better Together

    Microsoft is introducing BranchCache in Windows 7 (Enterprise and Ultimate editions only at the time of writing) and Windows Server 2008 R2.  This will allow a Windows 7 client to access a branch office cache of whole files that are stored on a Windows Server 2008 R2 content server.  The protocols being optimised are SMB (file sharing), HTTP and HTTPS (and logically, BITS).  There are 2 architectures:

    • Distributed: Clients in the branch office have a peer-to-peer network of sorts.  A client starts to download a file from the HQ web or file server.  The usual security stuff is done (as throughout this process).  The client broadcasts on the LAN to see if other clients have the file cached.  It uses a hash ID for the file that is obtained from the content server.  If no other clients have it, the file is downloaded from the HQ.  Another client now starts to download the file after the usual security stuff.  It again broadcasts using the hash ID.  The first client responds and the second client transfers the file over the LAN – not the WAN.  This uses a broadcast model and is thus limited to a broadcast domain or single VLAN.  There’s also the problem that a PC might hibernate or be turned off, thus disabling it’s cache content in the branch office.
    • Hosted: A Windows Server 2008 R2 server is placed in the branch office.  The branch office clients use unicasts to communicate with it rather than using the broadcast based peer-to-peer model.  This tidies up the network, allows to multiple VLAN’s and the cache is always on.

    The initial version of BranchCache only supports file based, not block level, caching.  It also only caches the download.  Uploads (saves) must be transferred over the WAN to the central server and are not optimised.

    All the settings of BranchCache are controllable using GPO.  Content administrators can control it at the share and site level.  Caches are secure and users can only access what the content share permissions allow for.

    Move The User Interface

    We’re used to having the client (PC) in the branch office or out roaming on the Internet and the server in the data centre.  We’ll always have some kind of latency when data has to travel between the central server and the remote client even if we use any of the above advanced solutions.  What if the user "logged in" using a client that was close to the servers.  Maybe that central client would be accessible from anywhere, no matter where the client was, e.g. in a branch office, hotel or at home.  Terminal Services is a mature way of doing this.  Citrix has built upon it for companies with larger TerSvcs server farm requirements.  With these products, the user logs in using physical equipment but their session runs in the central data centre.  Data travels only over the WAN, not over the LAN.

    Windows Server 2008 Terminal Services solved the biggest problem with this type of solution: printers.  Terminal Services administrators were sick of printer driver issues on the servers.  Thanks to EasyPrint you don’t have to deal with drivers any more – if the clients are running Windows 7, Vista SP1 or XP SP3.  And users don’t have to wait half an hour for the print job to download.  It’s near instant thanks to Microsoft’s XPS technology.  Microsoft also added application publication, a SSL interface and the ability to securely access those from anywhere using the TS Gateway.

    Windows Server 2008 R2 rebrands this as Remote Desktop Services.  This is beacause they’re adding a VDI broker to access virtualised desktops running on a central Hyper-V (machine virtualisation) farm.  At the time of writing this is still a beta.  You can access RTM solutions from Provision Networks and Citrix.  I like the look of the Citrix solution because it looks pretty complete.  The idea of VDI is that users access a familiar desktop environment, existing adminsitrative systems can be reused, application issues are minimal (Terminal Services can require "application silos" of application specific servers) and Helpdesk doesn’t need to do change control (like on Terminal Services) to fix user application issues.

    Both of these solutions drastically change the user system but they totally eliminate the effect of latency or bandwidth restrictions on cross-WAN or Internet application usage.  I’ve used them in the past with great success.

    Summary

    So that’s a basic look at bandwidth VS latency and how they impact Internet and WAN based services.  We saw how dedicated appliances, The Next Generation TCP Stack and how Windows 7 paired with Windows Server 2008 R2 can work to reduce bandwidth limitations as well as geography caused latency.  The basic lesson is, look at more than just bandwidth.  Without optimisation, latency will continue to negatively impact interactive services no matter how much expensive bandwidth you throw at a problem, e.g. you cannot make Sydney-Australia move any closer to Dublin-Ireland.

    EDIT #1: I added a section on Terminal Services and VDI.

  • No Comment.

    Add Your Comment

    Get Adobe Flash player