Large DPM databases

Had an on-going issue recently where one of our DPM servers seems to crash over night. Eventually I found an error in the log stating that the DPM server had lost connectivity to the DPMDB database in the SQL instance. I thought this was a little odd because we have three DPM servers which all connect to instances on the same SQL server, and the other two servers had been fine.

A cursory check of the SQL server revealed nothing much, so I started looking at what DPM might have been doing. DPM has a number of housekeeping tasks that it runs to keep its database tables in order, and these run between midnight and 3am depending on the server load and how much each task has to do. It was between these times that we were seeing the crashes, so I started to check the jobs and the tables they look at.

It was quickly apparent that the table tbl_TE_TaskTrail was enormous, occupying nearly 16GB of the database. This seemed strange as one of those overnight processes is meant to clear out any records from this table that have a StoppedDateTime value older than 33 days ago, essentially meaning the table should only hold a little over a month’s worth of data. A quick check of the table running the SQL statement below showed we had a problem with this task:

SELECT TOP 100 * FROM dbo.tbl_TE_TaskTrail ORDER BY CreatedDateTime;

QueryForOldRecords-Cropped

If the clear-up jobs had been running successfully, we shouldn’t have records of jobs going back to 2015… So after discovering this, I ran the following to find out how many rows were older than the prescribed 33 days:

SELECT COUNT(*) FROM dbo.tbl_TE_TaskTrail WHERE StoppedDateTime < (GETDATE() - 33);

It returned a count of nearly 5 million rows… Hurp. No wonder the query was timing out!

So, how to clear this table down and allow DPM to continue on without impacting performance during the day or relying on the scheduled maintenance tasks, which demonstrably hadn’t been running correctly. Well, you can run the following SQL block against your DPMDB which will process the table in blocks of 50,000 at a time. It will delete rows associated with the tasks in the TaskTrail database first, before finally clearing the records from TaskTrail. Some of the tables might not have associated records in them, so don’t worry if some of the delete statements return results to the effect that they haven’t removed any rows, this is normal.

USE DPMDB_MY_DPM_DB -- Change this to the name of your DPM database
GO

DECLARE @GCTill DATETIME = GETUTCDATE() - 33
DECLARE @TempTable TABLE (TID GUID)
INSERT INTO @TempTable (TID) (SELECT TOP 50000 TaskID FROM dbo.tbl_TE_TaskTrail WHERE StoppedDateTime < @GCTill AND dbo.tbl_TE_TaskTrail.TaskID NOT IN (SELECT taskID FROM tbl_AM_AgentTask_Alerts) AND dbo.tbl_TE_TaskTrail.TaskId NOT IN (SELECT taskID FROM tbl_MM_MediaRequiredAlert))

DELETE FROM dbo.tbl_RM_RecoveryTrail_RecoverableObjects WHERE TaskID IN (SELECT TID FROM @TempTable)
DELETE FROM dbo.tbl_AM_AgentDeploymentTrail WHERE TaskID IN (SELECT TID FROM @TempTable)
DELETE FROM dbo.tbl_ARM_TaskTrail WHERE TaskID IN (SELECT TID FROM @TempTable)
DELETE FROM dbo.tbl_CM_InquiryResult WHERE TaskID IN (SELECT TID FROM @TempTable)
DELETE FROM dbo.tbl_MM_MediaRequiredAlert WHERE TaskID IN (SELECT TID FROM @TempTable)
DELETE FROM dbo.tbl_MM_Task WHERE TaskID IN (SELECT TID FROM @TempTable)
DELETE FROM dbo.tbl_MM_TaskTrail WHERE TaskID IN (SELECT TID FROM @TempTable)
DELETE FROM dbo.tbl_PRM_CloudRecoveryPointTrail WHERE TaskID IN (SELECT TID FROM @TempTable)
DELETE FROM dbo.tbl_PRM_ReferencedTaskTrail WHERE TaskID IN (SELECT TID FROM @TempTable)
DELETE FROM dbo.tbl_RM_CandidateDatasetsForSCAssociation WHERE TaskID IN (SELECT TID FROM @TempTable)
DELETE FROM dbo.tbl_RM_RecoveryTrail WHERE TaskID IN (SELECT TID FROM @TempTable)
DELETE FROM dbo.tbl_RM_ReplicaTrail WHERE TaskID IN (SELECT TID FROM @TempTable)
DELETE FROM dbo.tbl_RM_ShadowCopyTrail WHERE TaskID IN (SELECT TID FROM @TempTable)
DELETE FROM dbo.tbl_TE_TaskError WHERE TaskID IN (SELECT TID FROM @TempTable)
DELETE FROM dbo.tbl_TE_TaskTrail WHERE TaskID IN (SELECT TID FROM @TempTable)

So, how long does this take to run? I find on my SQL server (which also hosts two other DPM instances) I was processing 1000 records a minute, so a 50,000 record block would take five minutes to process. If your server is beefier, you can increase the number in the INSERT INTO statement to process more records at a time.

Hope this helps everyone out there.

Share and Enjoy:
  • Digg
  • StumbleUpon
  • Technorati
  • del.icio.us
  • Twitter
  • blogmarks
  • HackerNews
  • Tumblr
  • Posterous
  • email
Bookmark the permalink. Follow any comments here with the RSS feed for this post.
Post a comment or leave a trackback: Trackback URL.

Leave a Reply

Your email address will not be published. Required fields are marked *

This blog is kept spam free by WP-SpamFree.