Monday, May 14, 2012

Partitioning of Tables

Partitioning means dividing. The concept of dividing a huge table into small pieces and storing the pieces in different location is called partitioning.

As a matter of fact it takes less time to search for a particular row in a table having 1000 rows than searching for a row in table having 10 lakh rows. When a table is really huge with millions of records and table is used frequently then we can make use of sql feature called PARTITIONing. It increases the performance of the queries on huge tables. A RANGE PARTITION will helps in splitting the table or making the table in small parts based on range of values.

For example we have Employee table as shown in below picture and it has millions of records.

 In order to increase the performance of queries that are being run against the Employee table we can partition it as shown in below diagram:


We can partition a table based on values of any one column. If we look at the above picture you can notice the employee table has been partitioned into three partitions P1, P2 and P3 based on the values of ‘Empno’ column. The first partition will have employee details with EmpNo form 1 to 2500, second partition will have employee details with empno’s from 2501 to 5000 and the 3rd partition will have employee details whose empno is more than 5000. So now if we query details of employee whose  empno is 2566, SQL will directly look for it in partition 2 and returns the required details instead of searching entire table. It is the same Employee table but after partitioning acts as 3 different tables. This obviously saves a lot of time searching small tables instead of one huge table.

Note: We can partition an already existing table or we can specify partitioning while creation of new table.

In order to partition any table we first need to create a partition FUNCTION. In partition function we specify the ranges for our partitions. Means while creation of partition function we specify what sort of values should fall under which partition.

Syntax:

Create Partition Function <Function Name> (DataType)

AS RANGE LEFT FOR

Values (value);


Funcation Name : User defined.

DataType         : This data type value depends on the type of column we are selecting for partitioning the table.

RANGE            : It will be either LEFT or RIGHT

                            LEFT – The first value is the maximum value of the first partition.
                            RIGHT – The first value is the minimum value of the second partition.

Value              : Is the limit of values.

 Example:

Create Partition Function Emp_Part1 (INT)

AS RANGE LEFT FOR

VALUES (2500);

With above command I am creating a function named ‘Emp_Part1’. As I am partitioning the Employee table based on values of Empno column values so I  specified the datatype as ‘INT’ as Empno will hold employee numbers, value specified is ‘2500’ with RANGE – LEFT this mean the values from 1 to 2500 will be in first partition. As LEFT is specified, the first value that is 2500 here is the maximum value of first partition.

Next step in partitioning is we need to create SCHEME for the partition function. This SCHEME will help in specifying which partition to be stored in which file group.

Syntax for SCHEME creation:

Create Partition Scheme <Scheme Name>

AS Partition <Partition Name>

TO (Filegroup1,Filegroup2,..);

Example:

Create Partition Scheme Part1_Scheme

AS Partition Emp_Part1

TO ([PRIMARY]);

Here I created SCHEME with name ‘Part1_Scheme’ for the partition ‘Emp_Part1’ (which we created earlier). Here I specified ‘([PRIMARY])’ for the file groups this indicates my both partitions to be stored in Primary file group itself. But after executing above command we will receive below error:

Msg 7707, Level 16, State 1, Line 1

The associated partition function 'Emp_Part1' generates more partitions than there are file groups mentioned in the scheme 'Part1_Scheme'.

This is because we are planning to partition the table into 2 i.e.., a partition will have values from 1-2500 and the other one will have values above 2500. So we need to explicitly specify where the 2 partitions to be stored that is in which file group. so the correct command will be like below

Create Partition Scheme Part1_Scheme

AS Partition Emp_Part1

TO ([PRIMARY],[PRIMARY]);

As here I want to store both my partitions in Primary file group I specified PRIMARY for both, in case we need to store in another file group we need to specify the file group name. Also thing to remember here is how many partitions we have for each of the partition we need to specify the file group explicitly.

Now as I already have Employee table I will try to partition an already existing ‘Employee’ table. In order to partition an existing table we need to re-create CLUSTERED index with command below:

Create CLUSTERED INDEX Emp_Clu

ON dbo.Employee(EmpNo) ON Part1_Scheme (EmpNo);

Here I am creating a clustered index named ‘Emp_Clu’ for ‘Employee’ table on ‘Empno’ column. And at same time I am specifying to partition the ‘Empno’ column based on ‘Part1_Scheme’ value ranges which we created earlier.

I have inserted some rows and wanted to verify how rows are stored in partitioned table. In order to find how many rows are there in each partition we can use below query:

SELECT $PARTITION.Emp_Part1(Empno) AS Partition,

COUNT(*) AS [COUNT] FROM dbo.Employee

GROUP BY $PARTITION.Emp_Part1(Empno)

ORDER BY Partition ;

One of the query I like is what if suppose I want to see the data of first partition? The below query helps in viewing the data of any particular partition:

 SELECT * FROM dbo.Employee

WHERE $PARTITION.Emp_Part1(Empno) = 1;

Here ‘1’ specifies to retrieve data of first partition. Likewise if we need data of fifth partition then it will be ‘5’.

Now another important query is what if I want to check my partition value ranges. Below query will give value ranges of all partitions of a table:

SELECT * FROM SYS.PARTITION_RANGE_VALUES;





Partitioning is nothing but splitting of huge tables into smaller pieces for easier access that’s it. Partitions are not new tables but subsets of main table.



Thanks!!



If any mistakes please let me know it helps me to learn.

No comments:

Post a Comment