Windows Network Load Balancing and NIC Teaming

I’m going to do this all using HP Proliants …

Here’s the scenario.  There’s going to be a number of web servers running Windows Server 2003.  They’ll work cooperatively and share files somehow.  They must be load balanced using Windows NLB.  This means using the Unicast method with 2 NIC’s – Unicast allows the servers to talk to each other within the cluster.  HP Proliant servers come a pair of built-in NIC’s so you’d think you’re sorted.  Nope!  You must allow for NIC failure so that means putting in 4 NIC’s and creating two NIC teams, each consisting of a pair of physical NIC’s.

A NIC team is created using at least 2 NIC’s in the HP Network Configuration Utility (NCU).  The newly created virtual NIC has a virtual MAC address or Locally Administered Address (LAA).

Here’s the problem.  When you associate a NIC with a NLB cluster, you are applying a virtual MAC to it.  This MAC is applied identically to all of the NLB NIC’s on every server in the cluster.  Now think … your NLB NIC is actually a virtual NIC made from two physical NIC’s and already has a virtual MAC or LAA.  So which LAA should be applied?  The correct answer is the LAA of the NLB cluster.  This is because the IP address of the NLB cluster is associated with the LAA that should be assigned to the NLB NIC (the NIC team).  Without it having the right LAA, the Ethernet cannot direct traffic to it.

Normally you’d go into the properties of the NIC and configure the driver to set the LAA.  You can’t do this with a HP NIC team.  Instead, once you’ve associated a server’s NIC team with the NLB cluster, just open the HP NCU.  You’re warned that it knows there should be a different LAA for the team in question.  That’s cool.  Just click on OK to save the new configuration and you’re sorted.  Do not click on cancel to exit the NCU because it won’t save the NLB LAA for you.

Just repeat this process on each of the nodes in the NLB cluster and you’re sorted.

EDIT:

In practice, I found that the HP NCU in the HP PSP V8.0 is buggy.  I tested this thing endlessly yesterday and it was fine.  Then all of a sudden, without change, it broke overnight.  Node1 could not see the network (or Node2) but the network could see it.  Removing Node1 from the cluster repaired the network.  Adding it back in broke things again.  Doing the LAA dance in NCU fixed it for about 1 second (showing on a continuous ping).  The logic of it didn’t make sense … LAA issues would affect inbound connectivity to the NLB cluster IP but not outbound connectivity.  In the end I disabled teaming of the NLB NIC’s on both of the nodes.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.