Thursday, December 27, 2012

CDC FEATURE – SQL SERVER 2008


This feature introduced in SQL Server 2008. With this feature we can capture the changes that are being done to any specific table. The changes made are stored in new tables that will get created automatically after enabling this feature. Till before SQL Server 2008 we can capture the changes made to tables with the help of AFTER TRIGGERS.

In order to use this feature first we have to enable this at database level and afterwards we can enable it for respective tables. Once we enable CDC, 2 new jobs gets created (explained later) these jobs serves as monitors of CDC process. They keep monitoring the data changes of respective tables and will update the repository tables which gets created automatically. The repository tables will have information of all DML operations performed on particular tables for which CDC has been enabled along with old data.

In order to use this feature first we need to enable this feature at database level and then we need to enable for specific table\tables for which we want to capture the data changes.

To enable this feature we need to run below command:

EXEC sys.sp_cdc_enable_db


This procedure must be executed for a database before any tables can be enabled for change data capture in that database. Change data capture records all insert, update, and delete (DML)activity

applied to enabled tables, making the details of the changes available in an easily consumed relational format. Change data capture feature is available only in SQL Server 2008 Enterprise, Developer and Evaluation editions.

Note: Change data capture cannot be enabled on system databases or distribution databases.

sys.sp_cdc_enable_db creates the change data capture objects that have database wide scope, including meta data tables and DDL triggers. It also creates the cdc schema and cdc database user and sets the is_cdc_enabled column for the database entry in the sys.databases catalog view to 1.

sys.sp_cdc_disable_db  this SP disables change data capture for the current database. This SP disables change data capture for all tables in the database currently enabled. All system objects related to change data capture, such as change tables, jobs, stored procedures and functions are dropped. The is_cdc_enabled column for the database entry in the sys.databases catalog view is set to 0.

 
We can use below query to confirm whether CDC is enabled or not:

    USE master
    GO
    SELECT [name], database_id, is_cdc_enabled
    FROM sys.databases

sys.sp_cdc_enable_table this SP Enables change data capture for the specified source table in the current database. When a table is enabled for change data capture, a record of each data manipulation language (DML) operation applied to the table is written to the transaction log. The change data capture process retrieves information from the log and writes it to change tables that are accessed by using a set of functions.

To enable CDC for specific table we have to use below command:

EXEC sys.sp_cdc_enable_table
@source_schema = N'Administrators',
@source_name = N'Shift',
@role_name = NULL
GO

ü  Administrators           Is the schema to which the table belongs.

ü  Shift                     Is the table name.

ü  @role_name                   Is the database role used to gate access to change data. The purpose of the named role is to control access to the change data. If explicitly set to NULL, no gating role is used to limit access to the change data.

When change data capture is enabled for a table, a change table and one or two query functions are generated. The change table serves as a repository for the source table changes extracted from the transaction log by the capture process. The query functions are used to extract data from the change table. The names of these functions are derived from the capture_instance parameter in the following ways:
 
All changes function: cdc.fn_cdc_get_all_changes_<capture_instance>
Net changes function: cdc.fn_cdc_get_net_changes_<capture_instance>

cdc.fn_cdc_get_all_changes_<capture_instance> Returns one row for each change applied to the source table within the specified log sequence number (LSN) range. If a source row had multiple changes during the interval, each change is represented in the returned result set.

cdc.fn_cdc_get_net_changes_<capture_instance>  Returns one net change row for each source row changed within the specified LSN range.

sys.sp_cdc_enable_table also creates the capture and cleanup jobs for the database if the source table is the first table in the database to be enabled for CDC.


To verify for which tables CDC is enabled we can run below query:

SELECT [name], is_tracked_by_cdc
FROM sys.tables 

sys.sp_cdc_disable_table drops the change data capture change table and system functions associated with the specified source table. It deletes any rows associated with the specified capture instance from the change data capture system tables. Metadata that describes the configuration details of the capture instance is retained in the change data capture metadata tables cdc.change_tables, cdc.index_columns, and cdc.captured_columns.


EXEC sys.sp_cdc_disable_table
@source_schema = N'HumanResources',
@source_name = N'Shift',
@role_name = NULL

Note: If change data capture is enabled on a table with an existing primary key, and the @index_name parameter is not used to identify an alternative unique index, the change data capture feature will use the primary key. Subsequent changes to the primary key will not be allowed without first disabling change data capture for the table.

After enabling CDC we will notice new table for the schema ‘cdc’ got created automatically. Name resembles like cdc.HumanResources_Shift_CT.

This table contains all the data changes along with the old values. This table will have 5 additional columns:

ü      $start_lsn

ü      $end_lsn

ü      $seqval

ü      $operation

ü      $update_mask

 

Each of these new columns will have information like

$start_lsn – This will have the starting of log sequence number.

$end_lsn  -- This is the ending log sequence number.

$seqval  -- This will have sequence of values to order the row changes within the transaction.

$operation – This contains value as per the DML operation performed.

Like 1 – for Delete statement

        2 – for Insert statement

        3 – value before Update statement

        4 – value after Update statement

$update_mask -- is a bit mask where every column that was changed is set to 1.

Below 2 DMV’s can be used to monitor the CDC process.

sys.dm_cdc_log_scan_sessions:

The sys.dm_cdc_log_scan_sessions management view contains one row for each log scan session in the current database. The last row represents the current session. The view also contains a row with a session ID of 0, which has aggregate information about all the sessions since the instance of SQL Server was last started.

sys.dm_cdc_errors:

The sys.dm_cdc_errors management view contains one row for each error that is encountered during the change data capture log scan session.

Note:

       Computed Columns are not tracked in CDC. The column will appear in the change table with the appropriate type, but will have a value of NULL.
             Changes to individual XML elements are not tracked.
             Timestamp data type in the change table is converted to binary.

Example:

Enabling CDC for the database:

USE TestDB
GO
EXEC sys.sp_cdc_enable_db

Verifying:

USE master
GO
SELECT [name], database_id, is_cdc_enabled
FROM sys.databases

Enabling CDC for table:

USE TestDB
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'HotelDetail',
@role_name = NULL

Verifying:

USE TestDB
GO
SELECT [name], is_tracked_by_cdc
FROM sys.tables
where is_tracked_by_cdc = 1

Querying DMV’s:

USE TestDB
GO
select * from sys.dm_cdc_log_scan_sessions
Go
select * from sys.dm_cdc_errors

Disabling CDC for a table:

EXEC sys.sp_cdc_disable_table
@source_schema = N'dbo',
@source_name = N'HotelDetail',
@capture_instance = N'dbo_HotelDetail‘

Disabling CDC for a database:

EXEC sys.sp_cdc_disable_db

 

Thanks!!!