How Much RAM & CPU Does Window Server Deduplication Optimization Require?

I’ve been asked about resource requirements for the dedupe optimization job before but I did not have the answer before now.

Processor

The CPU side is … not clear.  The dedupe subsystem will schedule one single-threaded job per volume. That means a machine with 8 logical processors is only 1/8th utilized if there is a single data volume. Microsoft says:

To achieve optimal throughput, consider configuring multiple deduplication volumes, up to the number of CPU cores on the file server.

That seems pretty dumb to me. “Go ahead and complicate volume management to optimize the dedupe processing”. Uhhhhh, no thanks.

Memory

Microsoft tells us that 1-2 GB RAM is used per 1 TB of data per volume.  They clarify this with an example:

Volume Volume size Memory used
Volume 1 1 TB 1-2 GB
Volume 2 1 TB 1-2 GB
Volume 3 2 TB 2-4 GB
Total for all volumes 1+1+2 * 1GB up to 2GB 4 – 8 GB RAM

By default a server will limit the RAM used by the optimization job to 50% of total RAM in the server.  So if the above server had just 4 GB RAM, then only 2 GB would be available for the optimization job.  You can manually override this:

Start-Dedupjob <volume> -Type Optmization  -Memory <50 to 80>

There is an additional note from Microsoft:

Machines where very large amount of data change between optimization job is expected may require even up to 3 GB of RAM per 1 TB of diskspace.

So you might see RAM become a bottleneck or increase pressure (in a VM with Dynamic Memory) if the optimization job hasn’t run in a while or if lots of data is dumped into a deduped volume.  Example: you have deployed lots of new personal (dedicated) VMs for new users on a deduped volume.

6 thoughts on “How Much RAM & CPU Does Window Server Deduplication Optimization Require?”

  1. Have you got an overall view on Deduplication yet?
    We’re looking at whether or not to implement in our our Shared Staff Storage and User Redirected Folders.
    Don’t know if I’m brave enough to do it for the Virtual Machines yet.

    1. It works great on at rest files, such as user folders.

      Note that it is only supported for dedicated (non-pooled) VDI VMs at the moment.

    1. No. Right now, you can only use dedupe in 2 VM scenarios:
      1) Non-pooled VDI
      2) Virtual DPM servers where the data VHDX files (and only those files) are stored on an SOFS (via SMB 3.0) to achieve backup dedupe.

  2. Hello there,

    ” To achieve optimal throughput, consider configuring multiple deduplication volumes, up to the number of CPU cores on the file server.

    That seems pretty dumb to me. “Go ahead and complicate volume management to optimize the dedupe processing”

    Yeah you are right and I feel the same.
    I see you have a trick to increase memory consumption, do you have an idea if I could increase cpu usage too ?
    I have a dual cpu 2*14 cores with HT. So I have a total of 56 logical cpus, and I feel sad that only one will get used to dedup.

    Any idea other than creating 56 different volumes (and kill deduplication ratio in the same time) ?

    Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.