MMS 2012: Deep Dive Into ConfigMgr 2012 DRS and SEDO

Speaker: Saud Al-Mishari, MSFT PFE – think he’s based in the UK

The session is on the new replication model: RCM, DRS, and SEDO.

Key Concepts

SQL replication in ConfigMgr 2012 is nothing do do with SQL Server Transaction Replication
Data Replication Service (DRS)

Terminology

Stored procedure: sproc
SSB: SQL Service Broker
Change Tracking: SQL Server Change Tracking

RCM: Replication Configuration Management/Monitoring
Replication Pattern: a set of rules on what will replicate
Replication group: a set of tables that are monitored and replicated together
Replication Link: a replication connection between two SQL servers for a particular RG
Backlog: Unable to write data t the SQL Server DB after being received in the SSB Queue (usually SQL Server write performance)

New Replication Model

Global data is anything an admin creates and is replicated everywhere, e.g. collection rules
Site data is stuff like status, collection membership results, replicated up to parent site.

Client generates XML file and copies to management point. MP copies MIF to the site server. Site server process it. DRS replicates the changed data to the parent CAS contains the discovery data.

SQL Server Change Tracking

Change tracking allows application to keep a record of rows in a table that have been changed: insert/update/deete
Does not track changed data – obtained directly each sync
Added in SQL Server 2008 … not to be confused with Change Data Capture
Is enabled at the DB level and at the table level.

DO NOT ALTER THIS SETTING ON A SITE DATABASE

SQL Service Broker

Messaging service:

Asynchronous queue based service
Guaranteed delivery (not infrastructural guarantee – developer guarantee)
Allows messages to be grouped into a conversation … messages processed in order, allows for multiple threads to process queue

Elasticity:

Allows scalability

Replication Patterns

Global data flows in both directions. CAS and primaries all have the same data, e.g. collections and package meta data.
Site data flows up.
Global-proxy is admin and control data for secondary sites. A primary and secondary sites all have the same data. Subset of global data that secondary sites needs. Leverages SQL 2008 R2 Express at the secondary site with 10 GB limit.

Select * from vReplicationData to find all RGs and their sync schedules

ID is the key field in here.

Provider Access

SMS_ReplicationGroup is a new WMI class that supports replication. 1 instance per RG. Status propert allow you to determine the sttus of the RG.

What’s in an RG?

Select * from vArticleData where ReplicationID = XX …. using ID from above query

How big is the RG?

EXEC spDiagGetSpaceUsed

If a site goes down for a week or two, how much data must you send across? Use the above query to figure out how much data must be replicated by the RG.

Demo

In the SQL Management Studio. Select * from vReplicationData. Can see all the patterns for global, site and global-proxy. SyncInterval is the number of minutes between replications. DRS runs every 5 minutes .. no control over that.

Select * from vArticleData where ReplictionID = 7. Looks like Endpoint Protection data being replicated here.

Runs spDiagGetSpaceUsed .. takes a while. Returns the size of the tables. Replication Pattern shows the amount of data to replicate if you lose a site for the 3 patterns (global, site, global_proxy).

DRS Architecture

RCM handles replication link setup, maintenance and monitoring – command and control. It’s a thread of SMSEXEC.
SSB is the transmission engine of replication
The Sender still lives and is used for bulk copy for initialization and re-init.
5 day limit on DRS for outages – Due to the need to retain changes. It retains 5 days of data. Try to expand this for a 30 day outage and ConfigMgr needs to maintain 30 days of data. It’s 5 days to handle a long weekend apparently – site breaks at start of holiday, come back 4 days later and fix it.

Initialisation:

BCP: to extract table data
Sender: SMS EXEC sender thread
SMB/CIFS: copy data to the destination

On-going replication

SQL Server Change Tracking
DRS sprocs and SQLCLR
SQL Server Service Broker
XML

Demo – Break replication

SQL DBA has a bad day and disables dbo.ConfMgrDRSQueue. CMTrace is started from DVD. Opens rcmctrl log on site server. See that the queue not running causes and error. We can see that ConfigMgr actually reached out into SQL and re-enabled the queue.

In CMconsole , we have send demo. The link is degraded in one direction but not the other under Database Replication. Looks like TCP 1433 connectivity issue.

Site Initialisation

Setup start
Setup asks CAS for site number. If you have more than 50,000 clients, then you need SQL Enterprise Edition to chunk up data in the DB and partition it.
Setup finished and waits for replication to initialise.
The replication configuration data is requested. This group tells RCM as the primary how replication should be setup
CSA receives request and BCPS out the data and sends it via sender back to the primary
Primary now request remaining Global Replication Groups. CAS creates the BC packages and send them back to the primary. Primary then applies the new data from the CAS.
Primary site receives BCP fles and inserts all the data from the CAS> The primary can now switch to normal replication.

DRS Message Replication

Provider executes query that modifies table
SQL Server writes entries into change tracking table
On DRS sync: changes are packages up and inserted into SQL Server message queue sing a stored proc.
Message Broker transmits the message to the receiving site.
RCM monitors the queue launching activation stored procs to process
And more on receiving side to insert modifications on receiving side

WARNING: When A CAS Goes Offline

When the CAS goes offline for more than 5 days, don’t make changes on the Primary as a substitute as the CAS. The CAS will re-initialise the primaries after more than 5 days outage, thus wiping the Primary’s changes.

DRS Troubleshooting

The Replication Link Analyser RLS should be yur first stop. It’s predictable and can do some fairly complex remediation
RCM Log should be the follow up. But this is just a summary of what has happend.
For transmissions layer errors, the SSB queue is sometimes the most immediate source for error messages (of this type)

Views for Detailed Info

The main logging view: vLogs. They log into the DB. Select top 1000 * from vLogs order by LogTime desc. Limit that number. DO not select everything. Will hammer prod environment and compund the issue.
SMS_Replication_Configuration_Monitor registry key to configure logging

DRS Troubleshooting

Ensure that TCP 1433 exception is there for SQL Service and 4022 for SQL Broker.
SSB keys transmitted through setup – monitoring with Hman.
spDiagDRS will give you an overview of the state of DRS replication at the site. SiteStatus (coded), Replication Group Initialization Status, DRSQueueStates, QueueLenghts (ideally 0 and 0 or you have a backlog), Replication Group Status deltails the last time messages sent

Demo: View Queues

Click on the queues in SQL under service broker under CM database.

Procedural troubleshooting of DRS DEMO

Turns of SQL Broker. Makes a change to Client Policy.

Run spDiagDRS: EXEC spDiagDRS in SQL MS. We see messages jammed in the outbound queue.
SSB transmission_queue:
Service broker queues: We see connection failed errors. Telnet to the port and we see it fails.
vLogs: select * from vLogs ORDER BY LogTime DESC (beware * in real world … too much data)
RCM_ReplicationLinkStatus

The Database Replication link in CM console will flip to degraded and then flip to fail after about 25 minutes. Can run Replication Link Analyzer (RLA). In the demo it shows that there’s a network connectivity issue.

Invoke-WmiMethod –namespace rootrootsmssite_CAS –path SMS_ReplicatinGroup –Name InitializeData = arguementlist “20”, “CAS”, “PR1” to reinitialize a RG. RLA should do this for you if required.

SEDO – Why do we need a way of controlling changes?

As global data is replicated everywhere, a user on a primary site culd change an object at the same time as a user on the CAS or another primary.
This is an unavoidable consequence of multi-master replicated data model – ask AD.
SEDO is the solution to this.

What is SEDO?

SEDO = Serialized Editing of Data/Distributed Objects
Provides a way to enforece a single user editing of an object at any one time.
A lock request round trip can take less than 200ms from Primary to CAS to Primary
Default Timeout is 5 minutes.
Only SEDO enabled objects require users to get a lock
Supports explicit and implicit lock handling.
This is all transparent to admins. Important for devs building extensions to CM.

Technorati Tags: Event Notes,Events,System Center,Configuration Manager

MMS 2012: Deep Dive Into ConfigMgr 2012 DRS and SEDO

One thought on “MMS 2012: Deep Dive Into ConfigMgr 2012 DRS and SEDO”

Leave a Reply Cancel reply