2014
01.22

I’ve heard several times in various presentations about a whitepaper by Microsoft that discusses how the Windows build team in Microsoft HQ replaced traditional SAN storage (from a certain big name storage company) with Scale-Out File Server architecture based on:

  • Windows Server 2012 R2
  • JBOD
  • Storage Spaces

I searched for this whitepaper time and time again and never found it. Then today I was searching for a different storage paper (which I have yet to find) but I did stumble on the whitepaper with the build team details.

The paper reveals that:

  • The Windows Build Team were using traditional SAN storage
  • They needs 2 petabytes of storage to do 40,000 Windows installations per day
  • 2 PB was enough space for just 5 days of data !!!!
  • A disk failure could affect dozens of teams in Microsoft

They switched to WS2012 R2 with SOFS architectures:

  • 20 x WS2012 R2 clustered file servers provide the SOFS HA architecture with easy manageability.
  • 20 x  JBODs (60 x 3.5″ disk slots) were selected. Do the maths; that’s 20 x 60 x 4 TB = 4800 TB or > 4.6  petabytes!!! Yes, the graphic says they are 3 TB drives but the text in the paper says the disks are 4 TB.
  • There is an aggregate of 80 Gbps of networking to the servers. This is accomplished with 10 Gbps networking – I would guess it is iWARP.

The result of the switch was:

  • Doubling of the storage throughput via SMB 3.0 networking
  • Tripling of the raw storage capacity
  • Lower overall cost – reduced the cost/TB by 33%
  • In conjunction with Windows Server dedupe, they achieved 5x increase in capacity wutg 45-75% de-duplication rate.
  • This lead to data retention going from 5 days to nearly a month.
  • 8 full racks of gear were culled. They reduced the server count by 6x.
  • Each week 720 petabytes of data flows across this network to/from the storage.

image

Check out the whitepaper to learn more about how Windows Server 2012 R2 storage made all this possible. And then read my content on SMB 3.0 and SOFS here (use the above search control) and on The Petri IT Knowledgebase.

6 comments so far

Add Your Comment
  1. I couldn’t see any mention of how they had their Storage Space configured (parity/mirroring) etc. Don’t suppose you know?

  2. Excellent post Aidan! I have been curious how far invested Microsoft has become internally on using its own tech, in this case Storage Spaces, and this article gave me that answer. Keep up the great writing!

  3. What I really want to know is how did they connect that shared storage. 20 JBODs and 20 Servers with Shared SAS, how did that work?

    • Probably 5 clusters with 4 JBODs and 4 servers per cluster. If you need more than that: Google or hit search above.

      • Thanks for the response. The reason I asked is that we have been working with DataOn on a storage project to try and setup a 3 host HyperV cluster with direct attached storage of 4 JBODs. Initially there were some challenges with wiring that up but they seem to have figured out a good solution. That and they did not recommend using SAS switches with 2012 Storage Spaces. So a single cluster with 20 hosts and 20 JBODs seemed impossible without SAS switches. Splitting it up would be a good solution if that is what they did.

Get Adobe Flash player