I like CosmosDB, I really do.
However I am not a fan of the Core CosmosDB SDK that Microsoft provides. It’s like the vanilla SQL connector. It’s cool and it can do a lot of things but I don’t necessarily want to use everything.
Here’s my experience developing an entity store for CosmosDB in C#.
The name of the project? Cosmonaut
I should mention I’m also a big fan of Entity Framework (the core version).
It gives me everything I need to get my POCO objects, stick them in a database without having to write any SQL and get them back to manipulate them or use them.
The CosmosDB SDK isn’t like that. It’s more like a bad version of Dapper. It’s nice, fast and it does the job amazingly well, but you’ll still write a lot of code to get it up and running.
I want to use CosmosDB as my main persistence storage db to for prototype projects and some small applications I have up and running.
That’s the problem I wanted to solve. I will make a simple, tidy and easy to use entity store package for .NET Standard 2.0.
As I already mentioned I like how you Entity Framework goes straight to the point. You have a DbContext (or more) and you also have DbSets.
DbSets are essentially the tables in a SQL database. CosmosDB just like any other NoSQL database has Collections. DbContexts can be several things but they are associated with the database itself. The DbContext contains DbSets.
All I need to do to get this up and running is an interface and an implementation of that interface performing CRUD operations.
Disclaimer: This is the first time that i sat down and said that I will create something that I can see the community using. It’s my first Nuget package.
You cannot please everyone. It’s just impossible. So everything you read below is what I thought will be the “acceptable” by the wider audience.
First and foremost to get this out of the way. Cosmonaut would have to make use of the package Microsoft.Azure.DocumentDB.Core. The plan is to fix all it’s flaws and make it’s usage straightforward.
“Can I see an ID please?”
CosmosDB internally uses a property called id. What this tells me is that it will always need to have one. I can do nothing about it other than embrace it. Microsoft’s SDK allows you to insert an object in the database and retrieve it in an ORM fashion. It is essentially a dynamic which the library will try to cast to the object provided. What’s bizarre is that if your object has a property named Id (noticed the capital I?) then your document after creation will have one property named Id and another property named id which is the automatically generated internal one.
However if you try to get the document as the POCO object, it will (even more bizarrely) return the object with the internal id property mapped to the Id. This was the first (of many) limitations and issues i had to encounter. With some (not so elegant) code, the entities that the user is trying to add are being validated with the following way.
The consumer’s entities will need to have one of the following:
1. A string unique identifier named Id(case doesn’t matter)
2. A string unique identifier with any name the consumer wants but with the attribute [JsonProperty("id")] decorating it.
3. They need to implement the ICosmosEntity interface which will add the CosmosId automatically generated property.
All that the consumer has to do is use the ICosmosStore
The implementation of the store needs just some cosmos options to be ready to use.
These options are:
ConnectionPolicy (Optional, defaults to Microsoft’s default)
Collection Throughput (Optional, default to 400 which is the minimum because I don’t want to accidentally overcharge you)
I really like it when the packages I use provide an extension method to setup whatever the package provides. So I did that aswell!
Setting up an entity store is as easy as adding this like to your ServiceCollection setup.
Now the consumers can just use .NET’s DI to get the service. Internally the service is registered as a Singleton which is Microsoft’s recommended way.
Once that is setup the user can use this store to do the following:
This was my vision for what the initial release would have in terms of functionality. It looks very much like Entity Framework and surprisingly, it works as such.
Instead of going through the pain of finding self links and collections paths to update an object, now you can simply provide that object and Cosmonaut will do the rest. Same goes for any other operation. You can query the objects in the store with predicates and easily retrieve the ones matching the criteria.
Note here that I am still exposing the internal DocumentClient in case the consumer wants to do something that Cosmosnaut cannot do yet.
The extra mile
What I admire in great developers is that they never settle. They always ask “How can i make this better?”. I don’t think I’m one of them, but i really want to beone day. So i asked this very question.
The answer is simple. Up until now whenever you registered an object of some type Cosmonaut would use this type’s name to create a collection. If your entity class was called User then the collection created will be named users (which is the name of the class, pluralised and lowercase).
However this isn’t everyone’s cuppa. For that reason I create an attribute assignable to classes. It looks like this:
As you might have guessed already, the string mycollection will be the name of the collection that Cosmonaut will create for this entity. I really like this one.
Then I asked myself again.
You see, the way Entity Framework works is that it will not operate any actions when you use it’s DbSet’s methods. Instead it alter the objects and change their states so when the SaveChanges method is called then a transaction is created and entity tries to run the generated queries.
Unfortunately, even though there could be ways to go about something similar, I rejected the idea (after having a chat will some smarter people than me). The reason was that so many things can go wrong in both cosmos and the connection between that I would do more damage then good.
There would be no point in making something like this if I didn’t consider the possibility of someone using this in production (of some sort).
The way CosmosDB works is that you have the option to limit you RU/s (which is the throughput) to save money, because in Cosmos you pay based on many different factors, one of them being the usage itself. As you might have guessed, performance is depending on that in a huge degree.
What i didn’t know is that this is configurable on a collection level which i find pretty interesting. I took advantage of that feature.
Remember that CosmosCollection attribute? Well i also added a throughput option there as well. You can now configure the default throughput for all the collections at the settings level but you can also override this option with the [CosmosCollection(Throughput = 1500)] way. Cosmonaut will do all the setup for this on EntityStore registration.
Let’s talk numbers.
I performed several tests with the default connection configuration for CosmosDB and Cosmonaut.
With the collection set to 400 RU/s (which is the minimum)
Added 1000 documents in 22310ms Retrieved 1000 documents in 1335ms Updated 1000 documents in 39363ms Removed 1000 documents in 19843ms
With the collection set to 5000 RU/s (which is a reasonable one)
Added 1000 documents in 978ms Retrieved 1000 documents in 58ms Updated 1000 documents in 3932ms Removed 1000 documents in 888ms
With the collection set to 10000 RU/s (which is the max)
Added 1000 documents in 885ms Retrieved 1000 documents in 47ms Updated 1000 documents in 3841ms Removed 1000 documents in 722ms
What do these results tell us?
Well it’s pretty obvious. The throughput makes an insanely huge difference on performance on 400 to 5k but not that much of a great one from 5k to 10k which is double.
Let’s take a look closer and see why.
First and foremost keep in mind that this is the worst case scenario. The tests are using the “Range” methods for these operations except for the retrieval one. There range operations are performing the tasks in parallel to provide the best performance possible. In your real life application you won’t be doing that of if you do you won’t be adding 1000 objects (probably).
I should take some of the blame for the lower RU/s. You see, when the max RU/s is met some of the documents used in the Range operations will fail to be added because of that. I have coded internal retry logic for those files. I think that making sure that everything that can be processed, is processed. This approach performed way better than having a foreach look and awaiting individual processes.
These benchmarks took place 3 weeks after this post was made with a lot of performance improvements.
The project is open source under MIT license and published on Github.
You can find it here: https://github.com/Elfocrash/Cosmonaut
It is also on Nuget. You can download it here: https://www.nuget.org/packages/Cosmonaut
I would love some feedback, suggestions and issues reported.