The topic in this blog series is about level 300, so it’s going to kind of assume that most of you know what Microsoft Azure Storage is, and so I will kind of skim through the introduction. I will cover the best practices, and it’s broken down into various sections in terms of
After that, I will show you the patterns and practices that your disparate applications could make use of and that applications are currently using which enable them to scale as such.
Quick Introduction of Microsoft Azure Storage
It’s a cloud storage system that is built so that you can start storing your data in it and access it from anywhere where you have connection, internet connection, and at any time.
It massively scales because it’s an auto-scaling factor out there in the system that scales as your traffic patterns change. It’s highly durable because Microsoft Azure stores multiple copies of your data, and it’s highly available because of the way Microsoft stores these multiple copies. These factors make it very simple for you to go and build internet scaleable applications. So you have Xbox, you have Skype, you have Bing, you have the ISVMs, all these things are using Microsoft Azure Storage at the back end which enables you to scale.
It also has a pay-what-you-use model.
Microsoft Azure storage is exposed via easy REST APIs. These are open REST APIs. They are well documented. So anybody can write these REST interfaces with storage. But in addition, Microsoft knows that a lot of people just like to use that as an API as such. So Microsoft has client libraries in various languages, like C++, C#, .NET, Java, you name it. Cloud is language agnostic, and so Microsoft has a lot of language libraries out there that enable you to talk to storage.
Best Practices – Scaleability Targets
For a single storage account, all accounts in Microsoft Azure have been upgraded to the latest scaleability target:
- Capacity – You should have up to 200TBs / storage account
- Transactions – You should have up to 20k entities or messages per second of 1KB in size.
- Bandwidth for a Geo Redundant storage account (GRS)
- Ingress – up to 5Gibps
- Egress – up to 10Gibps
- Bandwidth for a Locally Redundant storage account (LRS)
- Ingress – up to 10Gibps
- Egress – up to 15Gibps
With respect to a single partition, Microsoft arranges base partitions (there are some good papers that you can delve deeper into what this means), but partition is a unit of scale.
- Single Queue – Account Name + Queue Name
- Up to 2k (1KB) messages per second
- Single Table Partition – Account Name + Table Name + PartitionKey Value
- Up to 2k (1KB) entities per second
- Single Blob – Account Name + Container Name + Blob Name
- Up to 60MBytes per second
Best Practices for Accounts, Traffic Patterns & Misc.
In terms of best practices for accounts, traffic patterns, basically, place your storage close to your compute, within the same region, because it benefits you from basically low latency, and also you don’t have a pay for egress charges as such if it’s within the same region.
Understand the account scaleability targets. In a lot of cases you want more than what a single account can give. So you’ll be creating multiple accounts. If your users are spread across regions, then you can choose these different regions to kind of scale out your storage accounts as such.
Distribute the load over many partitions and avoid spikes. So Microsoft Azure storage is a range-based partitioning system and it autoscale based on your traffic pattern. But the best traffic pattern that Azure can see is uniformly distributed traffic pattern as such. Append pattern is where, if you look at your traffic pattern and if you look at the keys, they would all be sorted by the key as such. So if it is sorted by a key, it is an append pattern. So if it’s ascending, it’s append. If it’s descending, it’s prepend pattern. These patterns don’t work well with any range-based partitioning system, so you need to strive to make sure that your traffic is distributed across different partitions. And you control the PartitionKey in almost all cases except for a queue, but you can always use multiple queues.
Always use the latest client SDK because Microsoft teams strive for optimizations in it. Every release either fixes some minor bugs or has some performance optimization. Microsoft strives to make sure that these are always improving.
Best Practices for .NET Settings for Cloud Services
Disable Nagle for small messages (< 1400 b) – if your payload size is less than 1,400 bites, TCP is going to impact you, so turning off Nagle is great if you want low latency.
ServicePointManager.UseNagleAlgorithm = false;
Disable Expect 100-Continue if you’re sure that you’re not exceeding the bandwidth or if you’re sure that your authentication is the right mechanism. If you’re unsure of either of those, then Expect100 might help you. What Expect 100 does, it actually sends a request UR and the headers to the server. The server does authentication and sees if this request can go through and then returns and 100-continue to the client, after which it sends the payload. So this is great for blobs, and if you feel like you’re going to exceed the ingress/egress limits or something like that. And this is more for when you’re processing your logs or some of that sort of big data solutions, normally set Expect100-continue.
ServicePointManager.Expect100Continue = false;
Increase default connection limit (default is 2)
ServicePointManager.DefaultConnectionLimit = 100; // Or more
Take advantage of .NET GC. GC performance has been greatly improved over the years. .NET 4.5 sees the introduction of background GC for the server flavor of the garbage collector.