Disclaimer: This feature will probably be removed from Cosmonaut when CosmosDB launches their built in auto-scaling logic.

I know, I know. If misused this can be potentially a really bad idea. It is also experimental. For that reason I want to put up a extra disclaimer. Use this at your own discretion. I will, of course, go though everything you need to know about this feature but please treat this more like an experiment that shows what can be done while we're waiting for proper auto-scaling to be added in CosmosDB.

Update. Since I wrote this blog Microsoft updated it's Offer object (which we will talk about later) to also have some auto scaling related properties like offerIsAutoScaleEnabled so we know server side auto scaling is coming soon.

With great power comes great responsibility - Uncle Ben Winston Churchill, 1906

Understanding the CosmosDB resource hierarchy

So let me explain something real quick. From the document to the database, everything in CosmosDB inherit from the same type. The Resource. Look at a resource as any other json document.

Here is an image that shows the resource hierarchy in CosmosDB

Something that this picture doesn't show however is the Offer resource.

The offer resource is the resource that contains the collection throughput information.

It's content looks like this:

  "offerThroughput": 400,
  "offerIsRUPerMinuteThroughputEnabled": false

As you can probably already tell, the offerThroughput is the property representing the provisioned RU/s.
What's interesting is that you can actually change this value and update the offer resource from both Cosmonaut or the CosmosDB SDK.

How CosmosDB charges you

Before we can dive into how autoscaling works I have to explain how CosmosDB pricing works.

You see, CosmosDB charges you based on the provisioned throughput per collection (or per database but we won't talk about this one) hourly. This means that if you have 1 collection provisioned at 400 RU/s for 24 hours you will pay 1 hour's worth of 400 RUs multiplied by 24. However, if you scale your collection up for even 1 second to, let's say, 1000 RU/s and then instantly downscale it, then for this 1 hour you will pay the 1000 RU/s equivalent, even though you only scaled up for 1 second.

Now that we are at the same page let's move on the implementation.


This is actually one of the really early features I added in Cosmonaut, mainly because I wanted to have it for several services I'm running. I never, until now, have talked about this feature and that's because it could potentially go wrong so please understand what you're doing. This is recommended more for apps that don't have multiple services targeting the same collection but rather a few or ideally one. That's because the code that will be responsible for the scaling could potentially overlap and cause unwanted issues. For that same reason the feature is disabled by default.

Cosmonaut has support for Range methods.

These methods accept an IEnumerable of some type and perform AddRangeAsync, UpdateRangeAsync, UpsertRangeAsync, RemoveRangeAsync. This means that many write operations will be launched at the same time. If this enumerable has 100 documents in it and each operation costs 10 RUs (which is the minimum) your method call will require 1000 RUs to handle the request. If your collection is provisioned at 400RU/s then you are almost definitely going to get some 429s and you will have to retry the failed documents. If you don't need your app to be lightning fast and you can afford some retrying and potentially some failed requests, that's fine. However if you do need it to be fast but you also wanna save some money then let's take a look at the CosmosStoreSettings class.

CosmosStoreSettings has two interesting properties. A boolean called ScaleCollectionRUsAutomatically and an int called MaximumUpscaleRequestUnits. Let's see what they do.

  • ScaleCollectionRUsAutomatically default value is false. If set to true the auto scaling mechanism will kick in.
  • MaximumUpscaleRequestUnits default is 10000. This is the maximum allowed value that the Cosmonaut auto-scaling will scale up to but you can configure it to whatever you want.

How the scaling logic works

This feature works only for the Range methods. Here is the sequence of events.

  1. Cosmonaut checks that ScaleCollectionRUsAutomatically is set to true.
  2. It gets a single operation from the IEnumerable and it executes it.
  3. It gets the request units cost for this single operation and then multiplies it with the count of the remaining operations.
  4. If the result is more than the MaximumUpscaleRequestUnits then it will set it to the MaximumUpscaleRequestUnits value. If it is less but it is not a multiple of 100 then it will round it up until it finds a multiple of 100. (This is done because CosmosDB allows only multiples of 100 to be set as RU/s).
  5. It will execute the rest of the operations and because the document sizes won't vary too much (because they are all of the same type) you won't get any 429/s,
  6. Once done the collection will be downscaled to whatever it was before.

An example

Let's look at the code snippet below.
All this will do is create 1000 documents in a collection. This collection is provisioned with 400 RU/s.

Running this with the scaleCollectionRUsAutomatically flag set to false it will take 29295ms to complete this operation with infinite retrying on.

If we change the scaleCollectionRUsAutomatically flag to true then the same request will take 2888ms to complete.

Each document creations costs 5.33 RUs so the code will automatically scale the collection to 5300 to complete the operation. This is only the case if you don't add a max scaling limit.

Setting the maximumUpscaleRequestUnits value to 1000 makes the operation complete in 6298ms.

As you can see, there is a lot of trial and error until you fine tune your limits.
The code for the scaler can be found here.

Use cases

You have to keep in mind that this is not a one size fits all solution. In fact, it has only one limited use case, which is the low end with few or only one service targeting the collection, but I think it strong enough to justify this feature's existence, at least until proper autoscaling for CosmosDB is implemented. Here's how I see it.

I'm OK with paying for up to 800 RU/s per month for a collection, but I also understand that I don't need these RU/s all the time. That's why I create my collection with the minimum of 400 RU/s and then set the MaximumUpscaleRequestUnits value to 800.

This means that my service keeps running smoothly with no 429s and I still pay exactly what I need. My metrics say that I pay an average of 580 RU/s per month BUT my users don't notice any performance issues due to potential retries that would normally take place because of the auto scaling logic inside my SDK. If my provisioned RU/s were 580 then I'd have multiple hours charged that I never used this amount of RUs and I would also have multiple hours that my users had a worse user experience because I had to retry a bulk update for some data.

I guess it's just food for thought and I'd really like to know your opinion on this one.