So no need to try to write down some of the settings that Stanley mentions. We'll give you the URL at the end of the presentation. The hope here is that for those of you who are Cloud SQL users today, you learn something about scaling Cloud SQL. And for those of you who are running MySQL yourself, you at least can crib off of our InnoDB tuning parameters, as a way to get started more rapidly with MySQL on GCB. And then we'll get into some next steps, and kind of wrap up from there. But first, I want to ask a few questions to get to know you a little bit. By show of hands, how many of you are developers? We'll call that, what, 2/3 of the room? 2/3 of the room? How many of you are database administrators, DBAs? And maybe a quarter of the room. How many of you are reluctant DBAs? A few more hands there. OK. And how many managers do we have in the audience? We've got a fair percentage there. OK, thanks for that. A couple more questions for you. How many of you use MySQL frequently?
How many are you using it almost day to day? OK, much of the room. How about PostgreSQL? Oh, we've got a few hands there. OK. And how many of you are current Cloud SQL users? That's pretty good. All right, thank you. OK. So format wise, we've got mics at the front of the room. Stanley and I are going to leave time at the end of the presentation for your questions. So if things are coming into your mind as we go along, please jot them down, and make sure that we get to ask them at the end of the presentation. We will also hang around after the presentation if you have additional questions. And we have members of the Cloud SQL engineering team here, the people who actually built the product, who will be able to answer your questions as well. So let's jump in. Cloud SQL is a managed database service. Our goal is to take mundane database administration tasks and make them much easier. Cloud SQL is not a DBA. For developers, we hope that Cloud SQL makes it easier for you to focus a bit more on development, and a little less on database administration.
And for DBAs, we hope you get to do something more interesting than install and patch databases. I will note that historically, Cloud SQL has offered MySQL. And when I say we offer MySQL, we don't just offer a MySQL compatibility layer, we offer standard MySQL standard binaries. Which means with few exceptions, if your application works with MySQL, it will work with Cloud SQL. I'm very happy to note that as of yesterday, Cloud SQL is launching PostgreSQL support. And again, this doesn't mean a compatibility layer. This means standard PostgreSQL database on Cloud SQL. So a couple of notes on that. Because we did have a fair number of you in the audience who raised your hands to my PostgreSQL question, and I see a few familiar faces out there from the presentation yesterday, PostgreSQL is launching, as of yesterday, when I say launching I truly mean launching. If some of you pull up your cloud console, some in the room will see PostgreSQL right now, while others will not. As the day progresses, and certainly into the weekend, more and more of you will see PostgreSQL in your console.
If you are familiar with our CLI and our APIs, all of you can create a PostgreSQL instance right now. So we hope you get started there. A few notes on PostgreSQL. We support up to 32 core and 200 gig plus instances on PostgreSQL. The same goes for MySQL, our large instance type, 32 cores and 208 gigs of RAM. We also offer inexpensive development instances, starting at a little less than $10 again, both for PostgreSQL and for MySQL. On the PostgreSQL side, looking at connectivity, PostgreSQL today– Cloud SQL for PostgreSQL– supports connectivity from Compute Engine, from App Engine flexible environment, which just went GA at this conference, From Google Container Engine, and generally from any client with a standard PostgreSQL connector. Two caveats here. One is that posts– or Cloud SQL for PostgreSQL product does not yet support connectivity from App Engine standard environment, which our MySQL product does. And generally, Cloud SQL does not yet connect with Cloud Functions, although we're working on that.
Cloud Functions was announced here at this event as well. I've listed the extension support we've come out with on the PostgreSQL product at beta. Extensibility is likely one of the most powerful features that PostgreSQL offers. We do offer the very popular PostGIS extension, which builds in very powerful geospatial support into PostgreSQL. And you see some others here. That said, you've got us here, you've got the Cloud SQL team here. And we want your feedback on extensions. If you're using PostgreSQL today, you use an extension you don't see on this slide, let us know. We also have some forums online where you can get that kind of information to us. We're all ears when it comes to extensions. OK. So let's get into Google's database storage portfolio. The database portfolio at Google is broad and growing. So I think one relevant question is, where does Cloud SQL fit into all of this? If you start right most on this slide, BigQuery is our data warehousing product.
We store objects in Google Cloud Storage. We have two non-relational or NoSQL databases, in Datastore and Bigtable. And then you get to the relational column, where Cloud SQL and Cloud Spanner lives. If you haven't had a chance to go to the Spanner talks, I encourage you to watch the YouTube recordings of those, or just get to know Spanner with some of the online documentation. It's a product we just recently announced. And left most on the slide is our App Engine Memcache product, which is our in-memory database product. So this kind of leads to the question, which is which of these products do I pick? You may have seen this flow chart before in other presentations. It's just meant to, at a high level, kind of guide you to the right Google database product. In this session, we are talking about structured transactional relational data that gets you to the bottom left of this slide, where Cloud SQL and Spanner live as our relational products. And then let me just ask, by a show of hands, how many of you are comfortable saying, I know the difference between Cloud SQL and Spanner?
So about a third of the room. Just very quickly, and there's much more depth than I'm going to touch on here, Cloud SQL is compatibility. Again, we're not just a compatibility layer for MySQL and PostgreSQL. We are offering standard MySQL and PostgreSQL databases. Cloud SQL, like many relational database services, is able to scale out reads, and we'll talk about this in this presentation, but we are not able to scale out writes past a single node. So one difference between Cloud SQL, from a performance perspective, and Spanner, is that Spanner does horizontally scale writes. And that's the horizontal scalability question you see in this flowchart. The other note here is around that compatibility. So if you have a MySQL application, a PostgreSQL application, with very few exceptions it will work with Cloud SQL. Spanner has its own compatibility model, which means if you're coming with a MySQL or PostgreSQL or an Oracle app, you likely need to take a look at changes to that application so that it works with Spanner.
Spanner comes with a myriad of benefits in terms of scale and global availability of data. I encourage you to get to know it a bit more. But for the purposes of this talk today, we're going to be focused squarely on Cloud SQL here forward. OK. So I've talked about Cloud SQL being a managed database service, and I think it's worthwhile to take a look at what it takes to run a database. It's just a way to illustrate differences between self-managing a database like MySQL, and using something like Cloud SQL. And in the end, it all comes down to trade offs. I, obviously, have a bias as the product manager for Cloud SQL. I love managed database service. But there are merits to both self-managing and managed database services. So let's start with the hardware. Some of you are today running on prem. You are thinking about things like power and cooling, racking and stacking of servers, maybe configuring servers on some vendor's website. Whether you are doing that or you've offloaded some of that to another group in your company, a colo, a cloud provider like Google, I would encourage all of you to still think about server maintenance.
Here at Google, part of our server maintenance strategy is live migration. You can read a number of blogs that the Compute Engine team has put out on live migration. I think it's a really cool feature with respect to server maintenance. And server maintenance should be on all of our radar, given that it has obvious implications to our application stack. Cloud SQL, by the way, benefits from live migration as well. Let's take a look at the OS. And this is where we really start to see a fork in the road between self-managing MySQL, for example, on Compute Engine here in Google Cloud, or using something like Cloud SQL. On Compute Engine, you will be installing the OS or picking an image with an OS installed, patching and maintaining that OS. On Cloud SQL, we mask the OS from you. You don't have access to the OS. We take care of the installation, the patch is keeping it secure. From a database perspective, in a self-managed world, obviously you're installing MySQL. You are patching it on some regular interval, I hope.
And you're setting up backups. Cloud SQL, again, does the install, does the patching, and makes backups automatic, or and also gives you APIs to call them on demand. And then for advanced features, when you're looking at high availability, those self-managing MySQL today know that you're thinking about things like replication, you're looking at tooling to discover, is my database– my master database healthy? Do I need to failover? If I do, you're either using alerts to manually failover, having someone paged, or you've written scripts that load balance and failover to another MySQL instance. Cloud SQL does this on your behalf. It's a check-box in Cloud SQL to set up high availability. We take care of the replication they'll be talking a fair amount about today, as well as all the tooling to health monitor and do your failover. The other thing you get with a managed database service is that you get the backing of Google's team. When you're self-managing, your team is responsible for keeping that database system up.
When you move to Cloud SQL, you lean on Google's team to do that for you. Last note, in terms of this stack, is scaling. Self-managed world, you are deciding when to scale, and how to scale. How do I scale up? How do I scale out? In Cloud SQL, we make scaling easy for you. But unlike other database products in the portfolio I showed you earlier, Cloud SQL does not auto scale. So you, as an administrator, still need to decide it's time for me to scale. Cloud SQL makes scaling up and out fairly easy, but you are the one who triggers that action. And last, but not least, is monitoring. This is not only things like application monitoring with stacked driver here in Google Cloud, but also answering questions like, can I get slow query logs sent to Pub/Sub? Or to cloud logging for easy consumption by third party products, like Splunk, for example. Cloud SQL and managed services do a lot of that plumbing for you. Whereas in the self-managed world, you're doing that of your own accord.
The bottom line when I look at self-managing versus managing is it's a trade off. You all are making trade offs with your time day to day. Certain parts of your technology stack, you want to customize, you want to spend more time there. And that means you're willing to maintain it. Other parts, you may say, I don't want to maintain it, and I'm willing to give up some of that customization to avoid the maintenance. OK. So looking at scaling Cloud SQL at itself, we're going to first focus on infrastructure. And then we're going to get into some real details around MySQL. Scaling up, I think is pretty intuitive when it comes to database. We encourage it because it works. It works as high as you can scale up, that is, and it's relatively straightforward. In Cloud SQL and generally in Google Cloud, it's easy to add CPU cores and memory to your instances. When you add them and you click Apply, we do take a reboot in Cloud SQL and on GCE, as you scale up CPU and RAM.
I think those are pretty intuitive scale up levers. One less intuitive way to scale up is scaling up storage performance. In Google Cloud platform, we scale storage performance with capacity. So as you add capacity to PD-SSD, you get more IOPS and measurement of storage performance. Same goes with Cloud SQL. You see some numbers here on the right hand of the slide. Also you see a screen shot on the left hand part of the slide. In Cloud SQL, we show you a performance calculator so that as you adjust the size of your instance number of CPU cores, for example, and the capacity of your, in this case, persistent disk SSD drive, we show you what you can expect in terms of storage performance and overall machine performance. In the example I've shown here, I selected a 32 core machine, our largest instance type in Cloud SQL, and you can see it can deliver me a maximum of 25,000 IOPS. That limit has actually been raised as of this conference, as Persistent Disk has raised its limits. But a maximum in this example of 25,000 IOPS.
But given that I've only selected 600 gigabytes for my storage capacity, I will only deliver 18,000 IOPS. So one action I can take now is increased capacity of my storage to max out those IOPS. I'll note here that when we increase capacity of storage, and thus performance, this is an online operation. You need not take down your database. In fact, Cloud SQL has an auto storage grow feature, if you check that box, will watch your capacity for you. When you need more capacity, we'll grow your storage device, and we do so online. The database doesn't go down. Scaling up, making sense? Easy enough. OK. At some point, you say I'm punishing my single MySQL instance so much that I need to consider moving some load off of it. As I mentioned at the beginning, Cloud SQL makes scaling out reads easy. We allow you to very simply create read replicas. And just by show of hands here, how many of you in the room are familiar with differences between Cloud SQL's semi-synchronous replication and its asynchronous replication?
So a number of you are fairly comfortable with that. Let me give you the one minute overview. semi-synchronous replication, Cloud SQL uses for availability. You see in this chart, we see three total database instances here. This is a very common configuration. I have my master instance leftmost. I have a failover replica, the place I will failover if some problem happens to my master instance. What I'm optimizing, therefore, is data. When I failover, I want to be very sure that the last transaction my application sent to the master that was committed, is indeed in the failover replicas. That when I failover, I have my data intact. To do that, MySQL incurs performance overhead. The master will wait for the failover replica not to commit the transaction, but to acknowledge that that transaction is in its relay log. So if you imagine the transaction coming from the application, without a replica, it goes from the application to the master instance, the master instance commits it, tells the application I've got it, and the application is now free to send another transaction.
When we add this failover replica, the matter takes the application's transaction, it waits for the failover replica to say I've got it in my relay log. Once it does that and commits the transaction locally, it gets back to the application. So you can see the performance overhead that is incurred here. The benefit, of course, is you get data in two places. In Cloud SQL, we always make sure your failover replica is in a separate zone, effectively a failure domain or you can think of it as a separate data center, so that if a problem occurs in one data center zone A, we can failover to zone B. This differs from Cloud SQL's asynchronous replication in that asynchronous replication does not require the master to wait for the replica to acknowledge the transaction. So my application can send a transaction, the master says, I'm going to commit it, I'm going to make sure that my replica is aware that it needs to do something, but I'm not going to wait for that replica to actually take action before I get back to the application.
In this way, you incur less overhead with an asynchronous replica. So asynchronous replicas Cloud SQL does not use for failover because there is a chance it does not have the same dataset as the master. We do use them for read scale out. And in this depiction, what you're seeing here is an application is able to write to the master instance, and read from any of the replicas, and of course, the master instance itself. So from a scale-up perspective, what we've been able to accomplish here in our application is we've moved expensive reads. For example, ad hoc queries or reports that get generated on Friday afternoon, to a replica, so that our master instance can focus more of its cycles on writes. Make sense to everyone? And we'll get deeper as we go along. So I mentioned a few times now, Cloud SQL makes scaling easy, but it doesn't auto scale. You have to make a decision as to when to scale. And so that begs this question, we recommend strongly you know your numbers.
When you look at the Cloud SQL console, you see a basic set of stats that are available to you. We're showing one of those stats in this chart. We also, as Cloud SQL, turn on the InnoDB metrics table that you can query for really deep MySQL stats. And we encourage you to refer to both of these in addition the Stackdriver, as you make decisions about is it time to scale up? Or if I scaled up as far as I can, is it now time to scale my reads out? With that, we're going to get into a bit of discussion on the relationship between availability and performance. And especially how availability and raw performance are very much intertwined. And I'll hand things over to Stanley. Thanks, Stanley. STANLEY FENG: Thanks, Brett. Can you guys hear me? All right. Cool. So I can have two topics. Today, one is about availability and performance as shown on this slide. The other topic is about, I can talk a little bit about the InnoDB [INAUDIBLE] tuning we have done to make the Cloud SQL. MySQL works best on the GCE platform.
So let's start with this availability and performance. In this section, I'm going to talk about how do we best balance in between replication lag and the raw performance. So let's start with a brief introduction in the overview of the Cloud SQL HA architecture. So HA here stands for high availability. Brett already gave us a brief overview of what semi-synchronous replication is. So basically, yes, our HA architecture is currently viewed on top of semi-synchronous replication feature, which is built in and available in MySQL. And as you can see, on this picture there is a master and there's a slave. And both are live and hot online instances. And then the master supports reads and writes. The slave only supports read. But if the master crash, slave will take over and becomes a master. The replication between these two instances are done by semi-synchronous replication, in the sense that each and every writes on that master has to be durably committed on the relay log of the slave, before the master can consider that transaction successful.
That part somehow guarantees durability, in the sense that a transaction is persisted on both copies on two different hosts. In this particular example, as you can see there, even persisting in two different zones. So however, the semi part comes in. When the slave picks up this transaction window, applies that particular transaction to the database of the slave, it does that asynchronously in the background. And generally, it only has one thread that's picking up those transactions from the relay log and applies to the database. It has only one thread in MySQL 5.6. It has an option to enable multiple sites in MySQL 5.7. And we haven't enabled that yet because there have been some concerns about reliability and data integrity issues. So bottom line, if the incoming write rate to the master exceeds the capabilities of the slave, that single thread on slave can apply this transaction, then you have replication lag. Those transactions will be saved in the relay log. They are safe, but the slave is lagging behind the master.
So let's take a look at the replication lag, and why it matters. Why? Because when the failover happened, the slave has to apply all those transactions in the relay log onto its database, before it can open up for business. And, remember in this high HA architecture, the slave is actually hot and online. If you don't have any replication lag, then the slave should just be able to take over and flip over really fast. Now, this connects back to the concept called RTO, which is Recovery Time Objective. So every serious production use of my MySQL, the business have a need to support a particular recovery time objective. In this case that if the master dies, the slave has to be up and running within x seconds. That's your recovery time objective. And so it's therefore important that you need to be able to monitor and control your replication lag if you want to make sure that your system can fit and meet your RTO. One way to do it is to throttle your application's write read.
And there is a way to do it. It's probably a harder way is to partition and shatter database and have a different instance. So let's take a look at how do you monitor in the see replication lag? So this, I showed an example here, which is the SQL command you can run on the slave side. And basically, on the slave side, MySQL maintains a status variable called Seconds_Behind_Master, so you can do a show slave status and then grab for Seconds_Behind_Master. And so replication lag is measured in seconds, not in milliseconds, not in hours, but in seconds. And so here, the Seconds_Behind_Master is 1,586 seconds. That means the slave is still applying the transaction that has happened on the master 1,500 seconds ago. So it's a pretty bad situation. Well, bad or good, depending on your RTO. So here's another view of the replication lag. Let's see how– The chart is a little bit blurry on those lines, but I think you can still see the numbers and the lines. So this is a sysbench run, sysbench OLTP run on MySQL 5.6 instance.
And the way we run this experiment is we increment a test load– doubled. We doubled a test load each step of the way. Each step here runs for 20 minutes. So here, you can see on the x-axis, as a test thread count, we run from 1, 2, 4, 8, all the way to 256 threads. On the y-axis, is basically the number, the slave reported seconds behind master. So you can see on this graph, when you run the test, it ranges from 1 to 8 threads, we are OK. We pretty much don't have any replication lag, it's pretty much close to 0. Now, at 6 test thread count, the incoming write rates start to exceed the rate the slave can apply those transactions, and you start to have non-zero replication lag, and all the way to 256 so that you have a couple of thousand seconds behind. One key thing to remember here, though, on this chart, is that are you OK to run at 16 thread count? Probably not. Because at that level, your incoming write rate is already exceeding, exceeded your slave– there's a rate slave can apply those transaction.
In the sense that if you keep running on this one, this line is going to keep increasing, keep increasing. So keeping that's an example of replication lag. And then next slide, I'm going to talk about how do we monitor and to control them? How do we moderate? So this one shows the relationship between replication lag and the transaction and a raw performance. So previous slides showed examples of replication lag, this one brings the TPS into the picture. So the way to read this picture is x-axis, again, is a test load. This is, again, sysbench OLTP run, and each step runs for about 20 minutes. And we run from one thread to all the way to 256 thread. On this chart there are lines and there is columns. Columns are for replication lag. And its on the write axis, right y-axis. And these are in seconds. And then so lines are for TPS, transaction per second. These are for on the left y-axis. So and also, you can note there are two lines here. One is the red, which is from a non-replicated test.
And then the green one is for a replicate test. And one interesting thing to note is that because we're using semi-synchronous replication, the performance gap between non- replicate and the replicated, isn't that big. Because if we are going through the completely synchronous replication, especially if you're across zone, you would expect to see much larger gap on these two lines. But because we are using semi-synchronous replication, the performance hit from non-replicated one isn't that much big. However, the price we are paying here is the replication lag here. So the way it can read the chart is for this 32 test thread count, we are able to achieve about 200 TPS. And those are the p50 TPS. But the replication lag, we can see at the end of this 20 minutes that is 467 seconds. So what does that mean? How does that help you on this? Let's look at the next slide. So on this slide, it's the same picture as the previous one. I marked two distinct TPS ranges. The first one is what I call a sustained TPS range.
So is the sustained TPS range, it ranges from 0 to 200, and then you can run from one test column– 1 test thread to 8 test thread. In this range, your replication lag is virtually zero. So you can run your application at this TPS for a very, very long time without ever needing to worry about violating your RTO. However, when you start to exceed this range, then you will start to accrue your replication lag. This is what we call a peek TPS. So from time to time, your application would like to burst, would like to exceed the standard TPS range. That's what the TPS– peak TPS is. So the way you want to model your application is that you would run your workload, and you create a charge similar to this one. And then you also consider how long you would allow that first to go. And then you can see if the replication lag at the end of your burst is below your RTO. For example, if you're RTO is 500, in this case, then that means your peak TPS can be as high as 200 for up to 20 minutes.
So on the lower right corner of the slide, you can see a little formula here is that the peak TPS times the duration you want to burst your TPS to, times your estimated replication lag growth rate, should be controlled under your RTO. So why do we do this? How does this thing help you? The most important information this can give you is to help you do capacity planning. So imagine you have your application. Your application has some kind of write workload. You can run this write workload, varying its load, and then collect replication lag and the TPS numbers. And then you can also get a good estimate of how much burst you want to give to the application for how long, during our business day. Or during the previous runs. So when you find all this information that you can plan properly, first, you can consider a threat to your application to make sure that when it bursts, it doesn't burst above certain TPS level. Second, if you really, really want that many TPS, then you also want to stick to your RTO.
Then the next natural choice is probably to shatter and partition your database, and have another one to serve and divide your write traffic. So OK. That is the section for the availability and performance. So the next section, I'm going to go over some of the Cloud SQL specific InnoDB tuning we have done, to make sure MySQL works better on the GC platform. So for Cloud SQL, we have tried to tune various InnoDB parameters, because InnoDB is– MySQL, is a very established and a mature product. And it was designed for bare medal hardware, and later was also the default set of the InnoDB parameters, what made in such a way that the database can run safely on all sorts of platforms. So, in order to make in InnoDB and MySQL work best for GCE, which as a cloud environment, has its unique characteristics, for example, for Persistent Disk, actually its write performance and write latency is better than reads. And also, was a new persistent disk IOPS and the bandwidth great. And actually, read has more bandwidth than writes on some of the machine types.
So there are all kinds of clouds unique and properties that makes it almost required to tune the InnoDB in a way that it could work best on the Google Cloud platform. So but I would like to start with one quick note here, is all tunings and the settings described here are subject to change. So view this section as more like for your information, for your reference, in case it also helps you. If you want to, you can run your own MySQL instance on our GCE platform. We are constantly working on this, and we will constantly be making improvements based on new information and new developments happening on the underlying operating system. So we will change those parameters as we see fit, and sometimes we may even decide to revert when some of those to default if new information warrants such. So keep that in mind. And I don't think there's any reason to take a hard dependency on these particular values. So let's start with the first one. InnoDB [INAUDIBLE], which is one almost everyone tunes when they write an InnoDB on their hardware.
So it's critical for performance for reads mostly, and maybe also for writes. So it needs to be set according to the available RAM on the VM. And standard recommendation is about 80% of the total RAM of the VM. So what we set it to is here, the wall of text. There's many text here, but just remember two numbers– 92% and 80%. So we take 92% of the total amount of RAM of the VM, and assign, give it to MySQL. And then we take 80% of the MySQL's RAM to InnoDB buffer pool. Now there are a bunch of caveats here, because on smaller VMs for for m1-standard-4 and below, and even for my core and the small. Those little VMs, we have to tune down those numbers to make sure they can run safely and are reliable, they don't crash. So for us, for those who are interested in doing performance work, if the data m1-standard-8 and above, then these two numbers will apply. And we actually highly recommend you guys to run 16 core or above for performance, heavy performance intense workload.
It's not an upsell. And it's really an honest suggestion for many, many reasons. For example, just recently Persistent Disk announced higher IOPS and the bandwidth offering. And the number of cores actually played a role on the underlying disk capability you can get. Not just the sites. For six 16 core machines so you can get 25,000 IOPS. And 480 megabytes reads and 240 megabytes writes. So those are not available in lower core machines, even if you buy at very, very large disk. So 16 core and above, we highly recommend for performance workload. Now, next set of parameters, which is buffer pool initialization. We enable buffer pool persistence at shutdown and the buffer pool preload at startup. So this allows us to pre-warm the cache and to reduce performance variability in between across DB reboots. Pre-warm cache is better than randomly warming them up during your production workload. Next one. Redo log size. So this is the most important parameter. I think one of the most important parameters for write heavy workload.
Default is 48 megabytes, which is really too small. 512 megabytes is what we set, and we set it to two log files, each of which 512 megabytes. So in total, you get 1 gigabyte worth of redo logs. So it helps absolve incoming write spikes. I would talk a lot more about what I mean here by sync flash state. So next one is networking. Networking here always tuned to two settings. One is to max_allowed_packet to 32 megabytes from 4 megabyte, and the other one is max_connections. So max_connections to 4,000. So max_allowed_packet allows your kind to sending large queries, in case you want to insert a big BLOB column. Or you have very long strings in your query that would be supported. So max_connections is self-explanatory. So InnoDB metrics monitoring. So there is a set of InnoDB metrics in the information schema table. Those are actually pretty useful. The default is actually not enabled. So we enabled all type of those metrics in that particular table. Now there's a difference between that table and a performance schema.
Performance schema is pretty heavy, and comes with not so ideal overhead. But for this particular table, the overhead is trivial, and the information it provides is valuable. So we turn it on, and it turns out to be very useful for our investigation and debugging purpose. OK. So for the next set of slides, I'm going to talk a little bit about dirty page flushes. So for InnoDB tuning, for you to tune writes, to tune dirty page flush and to make it properly, and to make it adjusted and adapt to your incoming write row, is really, really important. So let's first start with a little bit of overview of what dirty page is. And it has to start with the way how InnoDB implement writes. A write has to happen in two places concurrently in InnoDB. First, it has to be persistent in the redo log. Second, the write will be made into an in-memory data page, which is in the buffer pool. And once the writes is written in the data page, it's called a dirty page. And then at the background, InnoDB has a mechanism called Adaptive Flushing, that will appear [INAUDIBLE] flush those dirty pages into the on-disk data file.
Only after the 30 pages are flushed to the disk file, to the data file, then the content in the redo log can be reclaimed. And the space in the redo log can be three to four subsequent writes. So that's the way InnoDB handles writes. Here are the two settings we tuned to make the flush a little bit faster and more capable. One is called innodb_flush_method, we change that to O_DIRECT. Which makes the InnoDB to open the data file in O_DIRECT mode, so each and every write will hit the disk directly without depending on the subsequent fsync. So it makes using fsync really fast. The second one is innodb_io_capacity. The default is only 200. So this is a number that feeds into the Adaptive Flushing algorithm, and tells it how many dirty pages it can flush per iteration. There are many, many other parameters, but this one is one of the most important ones. So it has to tie into the underlying IO capacity of the block device. And our persistent disk, we can support up to 15,000, 20,000 IOPS.
And so if we change that to 5,000, it turns out to be pretty well for us. So the next set of parameters that we tuned, which is currently for MySQL 5.7 only, and it's still in the process of being rolled out to production, which will be available everywhere probably in a few weeks. Those are really, really for the tuning for a problem called sync flush point. Remember, I just described how InnoDB does writes to the redo log, and to the dirty page, and then delivery gets flushed. Once it's flushed, the redo log space gets reclaimed. Now think about this. If the income incoming write rate far exceeded the rate InnoDB can flush dirty page, then your redo log space is going to grow, grow, grow to a point. Usually it's about 80% of your limit. Then InnoDB gets into a panic mode. So-called panic mode is called sync flush. It means that it will not accept your next write until it can find the oldest dirty page, flush it to the disk, flush it to the data file, reclaim some space from the redo log, then they will accept your new writes.
So from your application level, you will see on your TPS chart, this forms something called a stop-and-go pattern. Your stop, your TPS will drop. And then you could go back up because the redo logs has more space. And then you will see a huge fluctuation of your TPS. It will hurt your capacity planning, it will hurt your front and application because it doesn't know how to set timeout when this happens. And things will just randomly time out and will fail. So here is a set of four settings we tuned to help and tackle this problem. And they attacked this problem from different angles. For first one, makes the flush more smart. It always tried to flash a contiguous region of 1 megabyte of data rather than a bunch of random writes. The second one makes a flush do more work for iteration. It was scan all pages in the [INAUDIBLE] tree. And the third one makes the flush respond faster to the incoming load. And the fourth one increases the parallelism of the flush. So here's sync flush problem [INAUDIBLE] just described earlier.
It was [AUDIO OUT] still. This is a sysbench step load function each. Step is 2x the load of the previous one. And what's plotted here on the y-axis is the flush lag in bytes. So remember, we have 1 gigabyte was of redo log space, 80% is about 800 megabytes. Here is about 800 megabytes. So as you can see, when we run the test at 128 threads of sysbench, it quickly– the flush lag, it quickly hits the ceiling, hits the point where InnoDB gets into the sync flush state. And InnoDB start the stop-and-go mode, and you cannot go higher anymore. So if you– I can show you the TPS chart in just two slides. The TPS looks horrible. And it will just up and down, up and down. So after the tune in, you see that's the after picture that we successfully suppressed the growth rate of the flush lag. That at 128 thread and 250 thread, it is kept away from the sync flush state. But at 256, it still hits a red line occasionally, and there's still more turning to do. So here's an end effect of the tuned versus untuned at 128 threads.
So if you remember the previous sync flush and the flush lag chart at 128 threads, shortly after test start, we hit the sync flush state, and then you can see this is a TPS chart. It went as low as 400, and then it goes back up to 2400. And then goes up and down and up and down. So [INAUDIBLE], the tuning that was applied, at least it gives you a very– a pretty stable lower bound of 1,200 to 1,200. You know, upper bound it still shoots up, and it gets– you can still get better performance, but the key point here is that as a predictable and reliable lower bound of the performance. Now with that, I'm going to turn the mic back to Brett to talk about next steps. BRETT HESTERBERG: Thank you much, Stanley. I think all of you– we had a number of developers raise their hand at the beginning of the presentation. Some DBAs and even more reluctant DBAs. And I think the takeaway for me from Stanley's InnoDB tuning work, is that we've gone from a place where by default MySQL, on Compute Engine or on GCP generally, shows pretty wide variability in performance.
You see that on the right hand portion of the graph. And then we move to the left, where we have a much more consistent performance experience. I think if you're a developer, or certainly a DBA, unwinding variability in performance is not an easy problem. And the work that Stanley's done has gone a long way to help us there. For those of you using Cloud SQL, what you've seen from Stanley today is in your database instance, and we'll continue to optimize a number of settings relevant, we think, to MySQL on Google Cloud, and more specifically in Cloud SQL. For folks who are self-managing, who are thinking about running a database on Compute Engine or are already doing so, as I mentioned, we have the InnoDB metrics or the InnoDB metrics that Stanley was showing, listed online at GitHub Gist. So you can take a look at this URL. We'll continue to update that document online as we make further progress with our tuning. Stanley and I joke that we're on an endless mission here to performance tune databases on GCP.
And hopefully you'll see us and share this kind of information going forward. We plan to, in that we want to make Cloud SQL very open and build it on products that you, yourself, can use in Google Cloud. So that you have a real choice between self-managing and a managed database service. A couple of points to wrap up on. Scaling Cloud SQL. Think back to the infrastructure scaling up. CPU and in-memory, I think, is very straightforward and intuitive. Remember that scaling IOPS means number of CPU cores and capacity of storage. Whether or not you're using Cloud SQL, you can use our create form to calculate what kind of expected performance levels you'll get based on number of cores and size of your persistent disk storage device. Look forward, as always, to future announcements, faster storage, larger instance types. I don't think we're quite done just yet. When we talk about optimizing performance and availability, some key points to remember. So Cloud SQL uses MySQL's semi-synchronous replication technology for high availability.
What this means is consider your workload, and how it might influence the replication lag that Stanley talked about where replication lag can affect your failover time. It can affect your availability. You will not lose data, or the goal of semi-synchronous replication is to ensure data durability across zones. But there is that RTO implication that Stanley pointed out. The gist here for you, is if you have a very bursty workload, you are pretty unconstrained even when using semi-synchronous replication. Stanley showed you the two lines, a MySQL database instance without replication, which is essentially as fast as MySQL can go. It's just doing its work locally. Versus a replication pair, a master instance with a failover replica. And the lines tracked pretty closely. What that means is for a bursty workload, you're going to get a lot of performance out of your replication pair. And you'll give your replication– your failover replica, rather, some time to catch up if it gets behind the master.
If instead your workload is not bursty, if it is consistent and you were driving inserts and updates on a consistent basis, take a look at what Stanley showed with respect to a sustainable TPS. If you start to get a replica behind, and you continue to write inserts, updates, the replica can go further and further behind, and you may have to throttle from the application side. I think those were the two takeaways, in my mind, from those charts. The numbers will surely change in the future. Stanley mentioned we showed TPS on the y-axis. I'm sure the numbers will change over time, but the general implications of semi-synchronous replication and MySQL won't change nearly as fast. So hopefully you're familiar with those now. Again, visit our GitHub Gist page for all of the InnoDB tuning parameters. That way, if you are self-managing on GCP, you can get started more quickly. A few presentation pics for you. We are on day three of the conference. I'm sure all of you are feeling at least a touch of fatigue at this point.
That said, we've got more sessions this afternoon. Here are three that grabbed my attention and grabbed Stanley's attention. 10 common causes of downtime and how to avoid them. That's 8302. Building high performance microservices with Kubernetes, and some really cool stuff in Go and gRPC. And lastly, a talk focusing on Bigtable, our NoSQL product. And this one is focusing on serving personalized content. So serving content to millions of users at really low latency. With that, I want to remind everyone you can get started not just with Cloud SQL for MySQL, but now Cloud SQL for PostgreSQL as well, as of today. And on behalf of Stanley, I want to thank you all for spending a bit of your afternoon with us.
Get the most from Google Cloud SQL: Google’s fully-managed database service that makes it easy to set up and maintain your relational databases in the cloud. In this video, Stanley Feng and Brett Hesterberg show you how to scale performance and securely connect to Cloud SQL from anywhere in the world. They dive into Cloud SQL’s architecture to understand tradeoffs between performance and availability and discuss important considerations for instance sizing. You’ll come away knowing how to get the most from your database, while letting Cloud SQL handle the mundane, but necessary and often time consuming tasks — like applying patches and updates, managing backups and configuring replications — so you can focus on developing your next great apps.
Missed the conference? Watch all the talks here: https://goo.gl/c1Vs3h
Watch more talks about Infrastructure & Operations here: https://goo.gl/k2LOYG