I’ve been asked about resource requirements for the dedupe optimization job before but I did not have the answer before now.


The CPU side is … not clear.  The dedupe subsystem will schedule one single-threaded job per volume. That means a machine with 8 logical processors is only 1/8th utilized if there is a single data volume. Microsoft says:

To achieve optimal throughput, consider configuring multiple deduplication volumes, up to the number of CPU cores on the file server.

That seems pretty dumb to me. “Go ahead and complicate volume management to optimize the dedupe processing”. Uhhhhh, no thanks.


Microsoft tells us that 1-2 GB RAM is used per 1 TB of data per volume.  They clarify this with an example:

Volume Volume size Memory used
Volume 1 1 TB 1-2 GB
Volume 2 1 TB 1-2 GB
Volume 3 2 TB 2-4 GB
Total for all volumes 1+1+2 * 1GB up to 2GB 4 – 8 GB RAM

By default a server will limit the RAM used by the optimization job to 50% of total RAM in the server.  So if the above server had just 4 GB RAM, then only 2 GB would be available for the optimization job.  You can manually override this:

Start-Dedupjob <volume> -Type Optmization  -Memory <50 to 80>

There is an additional note from Microsoft:

Machines where very large amount of data change between optimization job is expected may require even up to 3 GB of RAM per 1 TB of diskspace.

So you might see RAM become a bottleneck or increase pressure (in a VM with Dynamic Memory) if the optimization job hasn’t run in a while or if lots of data is dumped into a deduped volume.  Example: you have deployed lots of new personal (dedicated) VMs for new users on a deduped volume.

4 comments so far

Add Your Comment
  1. Have you got an overall view on Deduplication yet?
    We’re looking at whether or not to implement in our our Shared Staff Storage and User Redirected Folders.
    Don’t know if I’m brave enough to do it for the Virtual Machines yet.

    • It works great on at rest files, such as user folders.

      Note that it is only supported for dedicated (non-pooled) VDI VMs at the moment.

  2. Is Dedup supported for server VM’s stored on a SoFS CSV yet?

    • No. Right now, you can only use dedupe in 2 VM scenarios:
      1) Non-pooled VDI
      2) Virtual DPM servers where the data VHDX files (and only those files) are stored on an SOFS (via SMB 3.0) to achieve backup dedupe.

Get Adobe Flash player