If you've read any of my blog posts, you might have noticed that I'm hungry for knowledge. What you might also have noticed is that I like CosmosDB.

One of my findings when I was looking into the Bulk Executor library was that it was calling, what appears to be, system defined stored procedures. These stored procedures could not be seen in the CosmosDB explorer and if you try to directly access them in the explorer using their path then you get an error.

I really wanted to know what's in there in order to see what the code for these stored procedures looks like, potentially to use it in Cosmonaut.

The stored procedures used for the bulk executor are:

  • __.sys.commonBulkInsert
  • __.sys.bulkPatch
  • __.sys.commonDelete

This was one of the two things I knew about these stored procedures at this point. Their name. The second thing I knew was that I can actually create a Uri using the CosmosDB SDK UriFactory and generate a link to them. They are accessed like any other stored procedure. The problem is, I don't quite know what the parameters for them should look like so I was unable to actually use them.

Time to start working with what I have and solve one problem at a time.
I can't get the code of these stored procedures from the explorer but I do know they exist and that they are written in Javascript. They must be provisioned every time you create a CosmosDB database or collection.

What else do I have?

I have the CosmosDB Emulator!
The emulator is only supposed to emulate the behaviour of the actual service so its code is probably different. What can't be different however is the interface for these stored procedures. If the SDKs are using them (which they do) then, they should be called in the exact same way in both the emulator and the service even though the code in them might be different.

After some digging around I figured out that the code for anything JavaScript related, including the system defined stored procedures is in DocumentDB.JavaScriptEngine.dll. This is a library used by the emulator written in C/C++/idontknowbutitsnotcsharp.

The code in there revealed a JS function called docDbSetup(). This function contains everything that takes place when the db is set up. Part of it is defining the system stored procedures, and their code is there in its entirety.

The stored procedures

Turns out there are actually 7 of them:

  • commonUpdate
  • commonUpdate_BsonSchema
  • commonDelete
  • commonCount
  • commonBulkInsert
  • bulkPatch

(There are some others as well, but nothing relevant to what we are talking about today)

Their names briefly explain what they do.

Note here that I will not be posting the full code of each stored procedure but rather signatures and function summaries (if any). It's meant to help us understand how we can call these stored procedures ourselves. I myself don't fully understand how all of them work, but someone smarter than me can dig into this further and figure it out.

DISCLAIMER: These stored procedures are internal CosmosDB functions. The CosmosDB team can change their signatures or even remove them whenever they want (even though they are also referenced by stored procedures and they would have to update them too if they do any changes). Do not rely on them.

commonUpdate

commonUpdate_BsonSchema

This one is the same as the commonUpdate but it's meant to be used from the MongoDB API.

What they do?

They are using a combination of configurations to perform bulk updates and upserts to documents that match an expression.

commonDelete

What it does?

It gets a query spec and it converts it into a SQL query. This query then is used to match some documents and delete them in bulk.

This stored procedure is used by the Bulk Executor library.

commonCount

What it does?

It gets a filter expression or no filter expression and it uses it to do a select value 1 from c in order to return the count of the documents in a collection.

commonBulkInsert

What it does?

Pretty self explanatory. In fact, the function summary says it all. It bulk-imports documents in one batch.

This stored procedure is used by the Bulk Executor library.

bulkPatch

What it does?

It gets an array of patches for documents and it applies them using the document id and the document's partition key definition and value.

The query that will query the document you are trying to patch will be generated like this:

var query = sprintf('select * from root r where r.id = \'%s\' and r.%s = \'%s\'', patchItem.id, pkDef, patchItem.pk);  

This stored procedure is used by the Bulk Executor library.

What is a query spec?

A query spec is just some object that expresses some options that define how you can call a stored procedure.

An example query spec for the bulkDelete that deletes everything in a collection would look like this:

{
    "rootSymbol": "root",
    "filterExpr": null,
    "orderByExpr": null,
    "limit": null
}

Calling these stored procedures

In order to call any of these, you need to add a __.sys. prefix on their name. An example would be that commonCount needs to be changed to __.sys.commonCount.

  • Calling the commonCount

  • Calling the commonDelete

Deleting with filter is as easy as setting just the where clause of your query like this:

new { rootSymbol = "root", filterExpr = "root[\"name\"] = 'Nick'", orderByExpr = (string)null, limit = (int?)null }  

This will only delete whatever matches this filter.

Pretty much every other stored procedure can be called the same way.

Interesting observations

It appears that, system stored procedures also have the ability to be run across multiple partitions. We know this because this is how the Bulk Executor library works. It gets the internal partition key range id cache and then it uses this partition key range id per stored proc execution.

Also, while I was collecting data I did a test. I created 25 documents and I deleted them using a loop that deletes a document by id to delete everything in a collection from the SDK.

Then I also used the commonDelete stored procedure to delete everything in the collection.

The results were jawdropping.

  • Each delete costed 5.71 * 25 so 142.75 RUs and the total execution time was 1025ms. (Keep in mind this is serial deletion. I could spin up some tasks to make it faster).
  • While using the stored procedure the same thing costed 97 RUs and was fully executed in 72ms!

Conclusion

I guess, I just find interesting the power that the usage of a stored procedure has. Again, I not do recommend using these because they can potentially change at any time. However, I can't stop you. I hope you find some of this useful and I'd like to know what you think in the comments below.