Thursday, February 7, 2019

Partitions in Cosmos DB.

As discussed in the previous article, a container can hold different types of items based on the type of API selected. These items of the container are further grouped into different subsets based on a partition key. So, the item’s of container gets partitioned into subsets based on a partition key and this helps in managing the performance of the container. Each item in a container will have a partition key and it determines to which logical partition the item belongs to. For example, let’s say a container having details of different departments of a company and each department has a DeptID. If there are 100 departments and if DeptID is selected as the partition key then there will be 100 different partitions in the container.

The data added to the container and the throughput provisioned on the container are distributed evenly across all the logical partitions of the container. As the data increases in the container, the logical partitions are created automatically. The maximum size of a logical partition is 10GB. The storage and throughput we select during the creation of the container affect the number of physical and logical partitions in a container.

In the below pictorial example, C1 is the container and the data in the container has been logically partitioned as LP1, LP2, LP3 and so on.

The logical partitions of a container are mapped to physical partitions. These physical partitions consists of a set of replicas also called replica-set. These replica-sets hosts the cosmos DB instances. A single replica-set will have an instance of Cosmos DB.

The below picture shows how the logical partitions (LP1, LP2, LP3 and so on) of the above container C1 are mapped to different Physical partitions P1, P2 and P3. As we can see the physical partition P1 holds the two logical partitions LP1 and LP2. Same way P2 holds 3 logical partitions LP3, Lp4 and Lp5. The partition P3 holds the logical partition LP6.

These physical partitions are system controlled and we cannot manage these. But the logical partitions can be managed based on the partition key we select. Also, the efficient use of throughput provisioned on a container depends on the partition key selected. If the partition key selected fails to distribute the throughput evenly across the partitions of the container then the throughput provisioned on the container will not be used fully by the partitions. In such scenarios only a few partitions are utilized heavily and this type of partitions are called hot partitions. Each item within the partition will also have a unique itemid. The index of the item is formed by the combination of partition key and item ID.

Synthetic Partition Key:

As mentioned earlier choosing appropriate partition key plays a major role in distributing the data across partitions and in turn affects the throughput of the container. There might be situations where a particular field in a document cannot be a good partition key in such cases multiple fields can be concatenated and used as a partition key. This concept is similar to the concept of the composite index in SQL Server.

For example, in our department document example discussed at the beginning let’s say we have multiple fields like DeptID, DeptName, DeptLocation and so on. Here we can select DepID as the partition key field or we can use both DeptID and DeptLocation as the partition key.


If we select DeptID as partition key then partition key values would be 1,2,3,4 and so on. If we make synthetic partition key by concatenating DeptID and
DepLocation then the synthetic partition key values would be “1-Delhi”, “2-Hyd”, “3-Pune” and so on.

Also, there is another option to add a random suffix to the partition key. Like a random number can be added as a suffix to the filed selected and used as a partition key. Like if DepLocation is suffixed with random numbers then the partition key values might be like “Delhi100”, “Hyd101”, “Pune102” and so on. The other option is suffixing the partition key with a pre-calculated value. The hash value of a field is calculated and the hash value will be suffixed to the partition key field. This pre-calculated value suffixing helps in distributing the data evenly across multiple partitions and even this leads to faster reads as well. This is one of the ways hot partitions can be reduced.

Other articles on CosmosDB:

Thanks VV!!

#CosmosDB, #containers, #partitions, #partitionkey, #nosql, #cloud

Wednesday, January 23, 2019

Entity Hierarchy in Cosmos DB.

In this article, we will discuss different entities of Cosmos DB and the overview about the entities. Below picture illustrates the entity order of Cosmos DB.

We need to create a Cosmos DB account under a Azure subscription. Once we have Cosmos DB account we can start creating the database under it. There can be one or more databases under one account. A database in cosmos DB is comparable to a namespace, it is a logical grouping of containers. The database helps in managing containers. Based on the type of API (Application Programming Interface) we select the type of entities in the database will differ.

Each database can have one or more containers. Containers help in managing the throughput and storage of items of a container. That is, during the creation of the container we can select the throughput and storage capacity of the container and these values can be altered after the creation of the container as required. The data entered in the container are logically partitioned automatically based on the partition key. So as the new data gets added to the container new logical partitions get created automatically. These logical partitions are mapped to physical partitions. By using snapshot isolation in cosmos DB we can update items under a particular partition.

The throughput of the container also gets partitioned across the partitions. The throughput of a container can be configured in 2 modes:

  • Dedicated: Any container having throughput set to dedicated mode, the throughput of the container is dedicated to it alone.
  • Shared: The containers for which throughput is set to Shared, the throughput is shared among other containers of the database.

Containers are Schema-Agnostic which means it is not mandatory to create schemas while using it and in case if we want to create a schema, we can create. Due to this, by default, all the items of a container are indexed automatically. We can manage indexes by using index policies of the container. 

A container can have different types of items, it can have an item representing an employee and it can have an item representing a vehicle and so on. Based on the type of API we select the items will differ. For example, we can use SQL API if we want to build a non-relational database and to query using SQL syntax, we can use Gremlin API if we want to build a graph database, if we are planning to migrate from Azure table storage to cosmos DB then we can use Table API and so on.

A unique key constraint can be created on a container through which we can enforce one or more unique values per logical partition key. This helps in preventing the duplication of values that have been specified by unique key constraint. Based on the type of API we choose the property of the container varies. We can maintain the operations log of a container by using Change feed option. Through change feed, we can maintain before/after images of items of a container.

The life span of the items of a container also can be managed by using the Time To Live (TTL) option. This option can be used to delete particular items of a container after a certain period of time or it can be used against the container itself. The items set with TTL value gets deleted from the container once the value is reached.

Thanks VV!!

#cosmosdb #Containers, #items, #Entity Hirerachy

Tuesday, December 25, 2018

Consistency levels in Azure Cosmos DB.

There are 5 consistency models supported by Cosmos DB:

Strong Consistency: In this model, the read operations will see only the recent write operation. A client will never see an uncommitted data. This is the strongest consistency level of all the 5. To use this consistency level the Azure CosmosDB account should not associate with more than one region. 

Bounded Staleness Consistency: In this model the client can see uncommitted data as well, that is inconsistent reads are allowed. There is a threshold option which can be used to control the inconsistent reads. We can set a threshold to the inconsistent reads using either time interval or number of versions. Let’s say a time interval threshold is set for one hour, reads will be inconsistent within that 1-hour threshold and beyond the threshold, only consistent data is shown.

Let’s take the example of counting exam marks of a student. In this example Teacher is correcting the exam paper online and giving marks, student and parent are watching the marks online. Let’s assume the teacher, student, and the parent are in different regions. Server A is in the region nearby to teacher’s location, Server B is in the region nearby to Student location and Server C is in the region nearby to Parent location. 

As the correction not yet started servers of all regions show ‘0’ marks.

When bounded-staleness consistency is used and let’s say the threshold is set to 3 hours at 8 AM. So after 8 AM until 3 hours threshold is reached the data will be inconsistent. Before 8 AM, the teacher has granted 10 marks already so all the server’s show the value 10, at 9 AM teacher has increased marks to 15 which reflects on server A but as the threshold set is not yet crossed the Server’s B and C still show the value 10 itself. Same way at 10 AM teacher increased marks to 20 but still, that doesn’t reflect in other 2 servers. Now at 11 AM teacher grants 30 marks and the 3-hour threshold also crossed so the other 2 servers also see the recent write.

Session Consistency: When this model is used the client will always see his recent write. So this model allows the user to see their recent write and all the other sessions will see the recent write once the write gets committed eventually based on eventual consistency.

In our example of counting exam marks of a student, if session consistency is used, the teacher will always see her latest write but the student and parent will be able to see the recent write only after the data eventually becomes consistent. So at 9 AM when the teacher changes the value to 15 marks it will be written in Server A and the teacher would see immediately 15 marks but the other 2 servers will still show 10 marks, same way at 10 AM Server A would see 20 marks and the other 2 servers will still show 10 marks. The other 2 servers also show the recent write which is 30, once the changes get replicated eventually.

Consistent prefix Consistency: This model will allow the sequential reads to see only the sequential writes which are if the value is changed from 10 to 15 and then to 30 the other servers will show either 10,10,15 or 10,15,15 or 10,15,30 but the other servers will never show out of order writes that means the other servers reads will never see values like 10,30,15 or 30,10,15 as that is not the sequence they have been updated. The reads will always show the data in the same sequence they are written.

In our example as the teacher, updates marks to 10 then to 15 then to 20 and finally to 30, the student and parent also see the marks in same sequence 10, 15,20 and 30.

Eventual Consistency: This is the weakest of all the consistency models. There is no guarantee of consistent reads while using this model. As there is no threshold limit set we can never be sure the data is consistent. Probabilistic bounded staleness metric can be monitored to estimate how often we can see strong consistent reads.

In our example, let’s say at 8 AM teacher updated marks to 5 but the other servers will still show 0 marks and same way at 9 AM even though the marks have been updated to 15 the other servers will still see 0 marks. So once the data becomes consistent eventually then all the servers will show the same value.

Most of the real-time applications use Session consistency level. If strong consistency is required along with multiple regions then Bounded staleness consistency is used. When the highest availability and low latency are required then eventual consistency is used.

Please share the type of consistency used in your environment and the reason to use.

Thanks VV!!