Partitioning
means dividing. The concept of dividing a huge table into small pieces and
storing the pieces in different location is called partitioning.
As a matter of fact it takes less time to
search for a particular row in a table having 1000 rows than searching for a
row in table having 10 lakh rows. When a table is really huge with millions of
records and table is used frequently then we can make use of sql feature called
PARTITIONing. It increases the performance of the queries on huge tables. A
RANGE PARTITION will helps in splitting the table or making the table in small
parts based on range of values.
For
example we have Employee table as shown in below picture and it has millions of
records.
In
order to increase the performance of queries that are being run against the
Employee table we can partition it as shown in below diagram:
We can partition a table based on values of
any one column. If we look at the above picture you can notice the employee
table has been partitioned into three partitions P1, P2 and P3 based on the
values of ‘Empno’ column. The first
partition will have employee details with EmpNo form 1 to 2500, second
partition will have employee details with empno’s from 2501 to 5000 and the 3rd
partition will have employee details whose empno is more than 5000. So now if
we query details of employee whose empno
is 2566, SQL will directly look for it in partition 2 and returns the required
details instead of searching entire table. It is the same Employee table but
after partitioning acts as 3 different tables. This obviously saves a lot of
time searching small tables instead of one huge table.
Note:
We can partition an already existing table or we can specify partitioning while
creation of new table.
In
order to partition any table we first need to create a partition FUNCTION. In partition
function we specify the ranges for our partitions. Means while creation of
partition function we specify what sort of values should fall under which
partition.
Syntax:
Create
Partition Function <Function Name> (DataType)
AS
RANGE LEFT FOR
Values
(value);
Funcation
Name : User defined.
DataType : This data type value depends on the type of
column we are selecting for partitioning the table.
RANGE : It
will be either LEFT or RIGHT
LEFT – The first value is the maximum value of
the first partition.
RIGHT – The first value is the minimum value
of the second partition.
Value
: Is the limit of values.
Example:
Create Partition
Function Emp_Part1 (INT)
AS RANGE LEFT FOR
VALUES (2500);
With
above command I am creating a function named ‘Emp_Part1’. As I am partitioning
the Employee table based on values of Empno column values so I specified the datatype as ‘INT’ as Empno will
hold employee numbers, value specified is ‘2500’ with RANGE – LEFT this mean
the values from 1 to 2500 will be in first partition. As LEFT is specified, the
first value that is 2500 here is the maximum value of first partition.
Next
step in partitioning is we need to create SCHEME for the partition function. This
SCHEME will help in specifying which partition to be stored in which file group.
Syntax
for SCHEME creation:
Create
Partition Scheme <Scheme Name>
AS
Partition <Partition Name>
TO
(Filegroup1,Filegroup2,..);
Example:
Create
Partition Scheme Part1_Scheme
AS
Partition Emp_Part1
TO
([PRIMARY]);
Here
I created SCHEME with name ‘Part1_Scheme’ for the partition ‘Emp_Part1’ (which
we created earlier). Here I specified ‘([PRIMARY])’ for the file groups this
indicates my both partitions to be stored in Primary file group itself. But after
executing above command we will receive below error:
Msg 7707,
Level 16, State 1, Line 1
The
associated partition function 'Emp_Part1' generates more partitions than there
are file groups mentioned in the scheme 'Part1_Scheme'.
This
is because we are planning to partition the table into 2 i.e.., a partition
will have values from 1-2500 and the other one will have values above 2500. So we
need to explicitly specify where the 2 partitions to be stored that is in which
file group. so the correct command will be like below
Create Partition
Scheme Part1_Scheme
AS Partition
Emp_Part1
TO
([PRIMARY],[PRIMARY]);
As
here I want to store both my partitions in Primary file group I specified
PRIMARY for both, in case we need to store in another file group we need to
specify the file group name. Also thing to remember here is how many partitions
we have for each of the partition we need to specify the file group explicitly.
Now
as I already have Employee table I will try to partition an already existing ‘Employee’
table. In order to partition an existing table we need to re-create CLUSTERED
index with command below:
Create CLUSTERED
INDEX Emp_Clu
ON
dbo.Employee(EmpNo) ON Part1_Scheme (EmpNo);
Here
I am creating a clustered index named ‘Emp_Clu’ for ‘Employee’ table on ‘Empno’
column. And at same time I am specifying to partition the ‘Empno’ column based
on ‘Part1_Scheme’ value ranges which we created earlier.
I
have inserted some rows and wanted to verify how rows are stored in partitioned
table. In order to find how many rows are there in each partition we can use
below query:
SELECT
$PARTITION.Emp_Part1(Empno) AS Partition,
COUNT(*) AS [COUNT]
FROM dbo.Employee
GROUP BY
$PARTITION.Emp_Part1(Empno)
ORDER BY Partition
;
One
of the query I like is what if suppose I want to see the data of first
partition? The below query helps in viewing the data of any particular
partition:
SELECT * FROM
dbo.Employee
WHERE
$PARTITION.Emp_Part1(Empno) = 1;
Here
‘1’ specifies to retrieve data of first partition. Likewise if we need data of
fifth partition then it will be ‘5’.
Now
another important query is what if I want to check my partition value ranges. Below
query will give value ranges of all partitions of a table:
SELECT * FROM
SYS.PARTITION_RANGE_VALUES;
Partitioning
is nothing but splitting of huge tables into smaller pieces for easier access that’s
it. Partitions are not new tables but subsets of main table.
Thanks!!
If
any mistakes please let me know it helps me to learn.