What this means is that you get a relational database with ACID semantics and transactional consistency with horizontal scalability. So you get the horizontal scalability of a NoSQL system with the transactional semantics and sequence semantics that you come to expect from your traditional RDBMS. Cloud Spanner is a fully managed database service with global scale, traditional relational semantics. So you get schemas, ACID transactions, SQL. We do automatic synchronous replication within and across regions. And it's been battle tested inside of Google. In the 101 session I went through how our last manually restart in MySQL took several years and led to us to build Cloud Spanner where it's been in production at Google behind most of our mission critical applications for the last nine plus years. Schema design. So one of the things that's really interesting about Cloud Spanner is when we did the demo at the keynote today and you saw the application you don't really have many levers.
When you create a Cloud Spanner instance you give us a name, you tell us what the topology is– do you want regional and later this year we'll have multi-regional– and you tell us the number of nodes that you want. You don't pre-provision storage. You don't tell us anything about your data. You just tell us the number of nodes. And so 100% of your levers for optimizing your performance on Spanner are your schema and your queries. So similar to Cloud Bigtable your schema design is incredibly important in determining the performance that your application is going to get when you deploy it. So let's start with the basics. So people ask me, I've got a MySQL database, I've got a postscript database, how do I migrate? So the first thing is your schema. Cloud Spanner supports the most common SQL data types. So booleans, integers, floats, strings, bytes, arrays, dates and timestamps. I've put it up against MySQL so you can see we support all of the similar ones except MySQL doesn't support arrays.
And then obviously MySQL and postscripts have lots of other data types things like JSON and other features that we haven't implemented yet in Cloud Spanner. The key thing here is that anything that you see– anything that you have in your MySQL or your postscript schema should be directly transferable into Cloud Spanner. The first thing you have to be very aware of in Cloud Spanner when building your schemas is interleave tables. So MySQL doesn't have a concept of child tables. Postscripts does have child tables, but they're for a different purpose. Interleave tables in Spanner is the way that you can tell Cloud Spanner the logical relationship of your data. And this allows us to serve your data more effectively and efficiently. So the way to think of an interleave table is if you have a row– in this case singers– and all of your other data is related to that– in this case albums and songs– you want them all joined together. So think of an interleave table as a pre-computed join on your data to allow your queries to be really efficient.
What happens here is that when we are querying your data– so if you say give me all the songs for a specific singer– if you have those interleaved then we can very quickly and efficiently return those results to you because we know that all the data is related. And so we'll make best efforts to make sure that all of that data is located on the same node and very physically close to its related data. So we can return it as quickly as possible and hopefully avoid two phase commits when we do transactions. So by telling us that this thing is related to this other thing, which we call data locality, we can actually start to do a much better job of serving your data. The other thing that's important about interleave tables is that Cloud Spanner does not support the concept of foreign key constraints. Our version of foreign key constraints are these interleave tables. So if you need cascade on delete between items then interleave tables is the way that you achieve this. So if you want to do cascade on delete for a object or for a row it's a pretty good signal to you that you should probably interleave it with its parents.
So what does this look like in a schema? Again, Cloud Spanner uses standard SQL for schemas and for queries. So as you're doing your DDLs in defining your schema this is what it would look like. So create table singers. I have an integer field for the singer ID. I've got a string for first name, a string for last name, and a byte field for singer info where the primary key is singer ID. Now albums always are tied to a singer so therefore I can put a album ID and an album title, set the primary keys. So on an interleave table piled tables have to include the primary keys of the parent. This is how we literally do the physical interleaving of the data in the database. So my primary key has to include singer ID because of the parent. And then I can say interleave on parent singers on delete cascade. So this is how I implement a foreign key constraint in terms of cascade on delete. But it's also how I'd do a pre-computed join between singers and albums. Now to go another layer songs always belong to an album.
So now I can say create table songs. We can add a track ID and a song name. The primary key has to include the primary keys of the two previous tables– so Singer ID and album ID– and now we can add track ID as a primary key. So I interleave and paired albums for on delete. So as you're building your schemas this is how you tell Spanner what is the logical relationship of your data in the schema. I talked a little bit about primary keys. Spanner is different than some of the relational databases. We don't have auto-increment IDs. It's very important that we have distribution of primary keys. So it's very important that you don't do sequential keys, specifically not timestamps. We recommend that you use real world identifiers because this generally gives us sufficient amount of randomization to shard your data between the nodes. And, as we talked about before, it defines the physical relationship of the data. So the primary keys are the way that we know that one thing relates to another.
The next concept that every good RDBMS has is secondary indexes. So I built my schema, I've run some queries, and I see that they're slow. How can I speed up those queries? One way that I can speed up a query is by adding indexes. So Spanner allows you to build secondary indexes. So in this case we've got a table that has marketing budget, which is an integer. We've defined the primary keys in the parent table. And now we've added a function in our DDL that says create index albums by album title on albums where the key is album title. So this allows us to– or on the column album title. So what it does is it allows us to more efficiently query the table albums without having to do a table scan of the entire primary table. Pretty straightforward. Interleave indexes allow you to co-locate your index with your data. So again, if you are always doing queries inside of a certain key space– so in this particular time if you're always searching for all the things for a singer– I want to know every song that Prince ever did.
I want to know every song that's on an album. I want to know if we have number of records sold I want to aggregate all of the records sold for Prince over a period of time. I can co-locate the index with the physical data to further increase performance by interleaving that table in the parent table. So I can say create index songs by singer album songs on this table, provide the columns that I want, and interleave it in tables. So now when I do that query there is no chance that my index, which is just a normal table in Spanner, doesn't get located on another node. So the way to think about this is if I build a secondary index and I don't tell Cloud Spanner that it's interleave, what can happen is the primary table could be on one node and the index table could be on another node. And now every time I want to update the primary table I have to coordinate between the two of them and do a two phase commit. Similarly if I want to do a consistent read I have to access both nodes to get the data.
But if I tell Cloud Spanner that this data is related by interleaving it, then I can always go to a single node and I don't have to incur the overhead of going to multiple nodes. Let's talk about some anti-patterns. So I talked a little bit about this earlier, but I wanted to reiterate the point because it's a common problem that we see. As you're loading data into Cloud Spanner there's a natural inclination to use timestamps or values that will incrementally increase as you go through the system. And what happens here is because you're constantly adding keys that are further along you're forcing all of the writes onto a single Cloud Spanner server. And so what happens is that you can hot spot the server because you're always trying to add another value at the end of the row. If you can add a shard ID– either by taking a couple of values off the hash or just adding a random shard– what that allows us to do– and then putting that in the primary key first– what that allows us to do is to evenly distribute the writes across all the Spanner servers.
So by giving us a more diverse key space you allow us to better shard the data across the nodes and that allows us to more effectively write data into the system. One of the things that's in here is split points. So we do our very best never to split between primary keys and the root table, which is the first table at the top of the interleave. And so the more values you give us of primary keys at the root table the more places we have to put splits. And because you've told us that this data isn't interleaved we know that we can split it up across multiple nodes and that will be fine. If you do interleave it then we know that we need to keep it together. And that gives us, again, a better hint of how to logically lay out your data in the database. And so where this is most important is data loading. So one of the challenges is making sure that you load in your data in an efficient way. So one is partition your data by primary key. So using the best practices we talked about either invert your timestamps or put a hash or create a shard ID in front of your timestamps or moving the timestamp to the second or third primary key is a way to make sure that we can partition your data efficiently and load your data the most efficient way.
We also recommend that you batch your commits into 1 to 5 megabyte mutations at a time so you can do multiple writes in a single transaction. We recommend about 1 to 5 megabytes of data on each transaction. And the last thing is create your table and your schema, load your data, and then add your indexes. It's much faster to load your data and then allow us to back fill a secondary index than it is for us to do multiple writes on every insert by making sure that we've inserted data into the primary table and the index on every write. So always add your indexes at the end if you're doing a large data load job. As you saw in the keynote demo that we did today we did an 80 terabyte database of ticket information. And as we were loading the data, we were able to dramatically increase the load time by batching up the mutations and making sure that we had a very diverse key space. And it really does impact your ability to load data into the system if you break any of these best practices.
All right. Query optimization. So you have all the basic pieces of schema designed, you understand what the gotchas are, so now you can start to look at your queries. So first of all Cloud SQL speaks SQL for its schema definition and for its querying. When you're doing mutations we provide a separate API for doing mutations so that you can be more explicit to us about what you want us to do with your data. But as you do your queries there's a number of ways that you can have slow queries or want to speed up your queries. So the first one is missing an index. So I want to select last name first name from singers where the last name equals Smith. Now the issue here is that I'm not searching on index data. So I'm doing a full table scan. So if I want to avoid doing that I can create an index singers by last name on singers and index last name. So this will allow me to more efficiently query the table and pull back that information. Now I can actually make this faster. If I go ahead and do this first query– select last name first name where last name equals Smith– I will find the row based on last name but I still have to go back to the primary table to get the first name.
So I still have to join back to the primary table after I find the key. So the way that you can optimize this in Cloud Spanner is using the storing function. So you can say create index singers by last name on singer's last name storing first name. So what I've done is I've added additional information to the index so that I can retrieve data by directly reading the index table and never having to go back to the primary table. So again, another way to alleviate the need to go back and do effectively a join back to the primary table. Missing indexes for ordered results. So if you were doing lots of selects and you are doing order by you want to make sure that you have an index that's properly built for that. So if you're saying select first name last name from singers order by last name first name descending, then you want to make sure that you have created an index. So singers by last name on singer's last name first name and descending. And the way that you figure all this out we very quickly showed in the keynote today how to look at your queries.
And so when you run a query in the Cloud console there's a button there that says explain query and it will show you what we're actually doing in the query. And so these things will become apparent to you as you look at the query plan to understand whether or not you're effectively using your indexes. So what if I'm not using my indexes? Or what if I'm trying to do different kinds of joins? So a number of people have come up to me today at the Spanner booth and said do you support joins? And I say yes. And they say which joins? And so we do support a number of joins. So inter, cross, full, left and write we do– these are all again additional vectors for you to help optimize how you're querying your data– but we do provide a number of options for you that you would expect from a traditional RDBMS. The other thing that we do is join directives. So we often talk about how Cloud Spanner uses ANSI SQL 2011 with extensions. The two main extensions that we have are interleave– so for interleave tables– and our annotations.
And one of the most common annotations is a join directive. So you can see in this first query select star from singers join and then this at bracket annotation is what we use across SQL in Spanner to do annotations. Instead of doing a SQL keyword for this we wanted to have a single semantic for doing these hints, not just for joins but across the data. And so this is the annotation extension that we've added to SQL. One of the things that's interesting about Spanner and is different than a traditional RDBMS is that Spanner doesn't try to assume too much about your data. It doesn't try to figure out the best possible way to run your query. We have a query optimizer and if there's an obvious index for us to use then we'll use it. But in many scenarios Spanner will not choose to use the index that you might have intended. And so Spanner is designed to be very explicit about what you want to do. And so one of these is indexes. So when you run this query if you look on the query explain and you notice that you aren't hitting your join then what you can do is use the second query.
So select singer first name last name singer info from singers at force index equals and you can tell us the index. So if you look at your query explain plan and you realize that you aren't hitting the index you can add this annotation to your join to force the query to use your index. And the last one is we also have additional hash join types. So here we've got a hash join. There are loop joins as well. And all of this is in the SQL documentation. But we do provide all of these as ways for you to optimize. Again, Cloud Spanner the only knob you get is nodes. So all of these tools that we have in the schema and in the querier are your tools to optimize your queries and optimize your performance. All write, with that I'm going to bring up Peter Bakkum. Enough of the information about Spanner. I'm sure all of you want to get some practical information about what it really means to use Spanner, so I bring up Peter. PETER BAKKUM: Hey everybody, I'm Peter. I work on a platform at Quizlet and I'm going to be talking to you today about what it's like to use a Spanner from a customer's perspective and some of the results of our testing.
We were really, really fortunate to get early access to a Spanner and we've been playing around with it. So my goal here is to give you a really good picture of what we've learned and hopefully come away from this talk with a better idea of how you might use Spanner yourself for your own workloads. As I said, we've been testing for a while. I'm going to show you some of the test results. We hope to run Spanner in production for one of our tables very, very soon. And we'll have more information then. But I'm going to give you a complete picture of what we know. So I want to start by giving you just a brief introduction to us at Quizlet. Quizlet is an edtech company. We're a study tool. The product is you plug in terms and definitions and it gives you different ways to study them. So you can think of it as like a digital flashcard, that kind of thing. Quizlet we've been really lucky to have strong organic growth and so Quizlet over time has grown into a top 25 website in the US.
We have about 25 million users a month and a bunch of different study modes, web clients, mobile native clients, et cetera. Our growth curve over time has looked like this. As I said, we've had really strong organic growth and this is an interesting challenge for databases for several reasons. So one is that you'll note is that it's very, very seasonal, right? It turns out no one really studies in the summer. So we get this big drop in the summer. It's pretty quiet. And then around August, September we hit back to school. And so there we get this big, big rush. And as we go up that's also where a lot of our growth hits. So in the summer we have to do is preparation. We need to prepare for this traffic that we've never really seen before. So very, very seasonal but also the traffic is gradually increasing. It's always going up. And so what this means is that we've had to continually re-architect and re-figure out how to run our application. Quizlet runs MySQL.
Like many applications, we started just with a simple web app using MySQL. And over time we've had to optimize this and grow this into a more complex architecture. Our architecture now looks a little bit like this. So let me explain this a little bit. We started with just a simple database. We added replicas so that you could fail-over from the master to a replica if you needed to you. Can also swap back and forth for maintenance. We've added replica for backups. And then one of the ways we've scaled is by taking some of our tables and splitting them out into their own pods of maters and replicas. So this is basically vertical sharding, right? We've taken a table, we're running it separately, and we scale up this way. So you know this has grown. We have about eight pods of this in production write now. You can see on this diagram some of the pods have multiple replicas. We also have a tier of caching to help scale. So I think this is a fairly standard architecture and I'm sure that many people in this room have something similar, maybe even with a greater scale.
Several problems with this. So one is just complexity. There's a lot of lines up here. It's a complex architecture, we need to run it, we need to handle replication, backups, et cetera fail-overs et cetera. So you know maintenance costs, difficulty. But the other problem is just scalability, right? So we haven't done any horizontal sharding. This means that a single table is only running accepting writes at most on a single machine. So this is a pretty fundamental barrier for us, right? We can only handle so many writes on a single machine this means we're limited in how big we can let a table grow. So what's the next phase? This is really, really critical for Quizlet. This is the primary datastore so everything depends on this. And in fact, if this goes down we literally we have students complaining to us, we have people tweeting at us, people in classrooms come to a halt. This is what this looks like. So if Quizlet's database goes down we get tweets like this.
This is literally a classroom of students. It's hard to see but they're all holding up iPads that say Quizlet is down on them. And so I show this just to underline the point that database stability and scalability is like really, really critical for us. It's a study product, it's very high engagement when you use it, and so it's really important that the database be stable in performance and it grows over time. So we want to solve this problem. We've been examining Spanner to do that. And I want to talk to you next about how we looked at this and how we evaluated whether Spanner would be a good match for this workload. So how did do we do this? Well, the first thing we did is we picked a table that we were having trouble scaling. So first table we picked was called the Terms table. It's about 700 gigabytes 6 billion rows. And this is the core content of Quizlet so it's a really important table for us SQL to scale. Receives it's about 70% reads 30% writes.
So fairly significant right load. So we took this table, we observed how it looked in production. So the queries coming in, did a capture of those. Looked in-depth at the patterns et cetera. And then we created a synthetic workload based on that capture. So what I mean by that is we wrote a script to generate queries in a similar pattern as the captured queries. So same ratio of reads and writes, same query patterns, same locality of IDs et cetera as close as we can make it look. So we had this synthetic workload and then were able to play this against both Spanner in our test environment and against a separate test MySQL. And this is interesting because it gives us a really clean environment for comparing these two systems and doing results. So let me caveat this by saying this is obviously very, very specific to our use case and the results you are going to see are totally totally specific to your workload and may look different in other cases. But I want to show you what this looks like for us.
First question I have is how do these databases scale? So how did we answer this? Well what we've done is we've taken our synthetic workload, we've played against both systems, and we've observed query latency as we changed the throughput. So x-axis here is throughput y-axis is latency. As you can see MySQL latency initially lower than Spanner. So this is around 1 millisecond during normal conditions. We then as you can expect you eventually hit your performance ceiling and so we hit a threshold at which we really can't handle more queries. We see a spike in latency for MySQL here. Spanner we have a higher base latency and obviously you also cap out at a certain point. But the key point here is it's really, really easy to just change the number of nodes and scale the Spanner cluster and move your performance ceiling higher. So this is the whole point right? This is why this is a powerful system. So this is median latency. And the other thing we want you to look at is tail latency.
So what you're looking at here is 99th percentile latency on this workload that we ran. So again we're comparing these two systems where we're running Spanner at different sizes. And you can see as you approach your thresholds tail latency goes up in this curve, right? An interesting thing here also is that the Spanner tail latency act– or the MySQL tail latency is actually worse than Spanner for the low query thresholds. So again this is totally specific to us and specific to the MySQL instance we were running, but our goal here was to test the scalability of Spanner and how a synthetic workload would look on the two systems. And I think it was a successful strategy. So I want to dig a little bit more into this latency because I think it's really important if you're using Spanner and implementing it to an application. So what I'm going to show you now is a histogram of Spanner latency in this workload. So this is Spanner latency for one of the easier queries we were running.
The axis is buckets of latency and the y-axis is count. So this query on Spanner we saw 5 milliseconds median latency. And this is the curve we saw. So we see an initial jump in the curve and then we see a longer tail latency. But as I noted before, the long tail is fairly thin. So to us Spanner was surprisingly consistent in its performance. So this is great. We know about the latency. We know a little bit of a scalability. I want to talk more about what are the implications of Spanner for application? And one of the other things we want to know about– well, there's several things. So one as I noted with Spanner compared to MySQL just running on a vanilla VM probably your base latency is going to be a little bit higher. You tail latency is probably going to be lower. For us a higher base latency needs we need to be more thoughtful about how we use it in the application. So asynchronous queries become a little more important. And the other thing is that it's very, very scalable.
So again, this is the thing we really like about Spanner. This is the power. That we can just change the number of nodes and adjust the performance ceiling of the database. So what this means is that for your application auto scaling becomes really, really simple. You're basically just changing a variable. The other thing we noted– Spanner is a distributed system. As you may know, it partitions your data into splits. And this has implications on your schema, on your IDs, et cetera as Don mentioned. The other interesting thing we observed– so as we were running the synthetic workload we ran it on both systems. Both, as you can imagine, you initially start sending the queries you see warming, right? Both systems warm up. You see the initial spike and then they're a bit flatter. Cloud Spanner interestingly also optimizes the splits and so over time, we saw a more gradual warming on Spanner. So as we ran a certain cluster size the latency would actually get better over time. So we have a good idea now of Spanner performance, scalability, some of the implications of integrating with our app.
And I want to talk a little bit more about what this looks like for us. In other words, how we integrate it. So several things here. First thing is we need a way to talk to Spanner so we need to pick a client. Quizlet was originally written in PHP. We now run on HHVM, which is Facebook's PHP compiler. And our code is a mix of PHP and Hack. PHP is the favorite language of few people. And so we've optimized and we really like HHVM and Hack and it's been a good way to carry our application forward. So there's not yet a Spanner PHP client though I understand one is coming soon. So initial options for the client are one. There's the REST API– this is just totally generic sending HTTP and getting responses back. Two– one of the things that I really like about Spanner is that it sits behind a gRCP interface. So because it's gRPC the interface is defined totally programmatically and it's actually possible to generate clients in any language that gRPC supports. So we've actually use both of these strategies for our application.
Next thing for us is how do we take our application and actually create queries to send to Spanner and send them? So like many applications we use an ORM. Ours is custom to us. I put an example query here. The point is that the application you're composing a query using this fluent interface, you're setting some parameters, maybe some query predicates, et cetera. Then you're going and generating SQL. So integration for us looks like taking this interface and reimplementing the back end specifically for Spanner. So the SQL dialect is a little bit different than MySQL and Spanner doesn't offer DML statements. So that's a little bit different. But in general it's very similar. And this is the whole point, right? We want something relational we can move to. So we've taken this ORM interface and we've reimplemented it for Spanner and so within our app we're now able to communicate with both systems using the same fluent interface. This is our strategy for handling this.
So I want to talk a little bit more about schema for our case. As Dom, mentioned there's some definite schema implications for Spanner. You know the nice part is that much of our schema maps directly over. Again, this is the whole point. We want to move to a system where we have the relational model, our data types carry over, our queries carry over, and it's more scalable. So most things map directly over. There's a few implications for the ID format. I'm going to dive into that. And then the other important thing to remember is just you want your schema to be thoughtful around data locality. So you want to use your schema to limit the number of queries that touch multiple splits. In other words, that touch a bunch of areas of your data. And I'll show you how we do this. So I'm going to show you our schema on MySQL and then our schema on Spanner. There's a lot of details. I'm going to highlight the important parts of this. It's mostly pretty similar, right?
We've taken our columns, we've mapped them over, we've mapped them into the Spanner's data types. The things I'm going to call out here are one– we've added new ID columns. So we've added columns specific to Spanner's ID. Sorry, specific to Spanner's favorable primary key characteristics. We've changed the primary key. And we've intentionally used a primary key that gives us a locality on that column Spanner ID. So the Spanner ID column there's locality within the Spanner set ID. And we've designed the primary key that way. We've also leveraged a secondary index here and so we're creating that in the scheme as well. So let me explain a little bit more about how we're handling IDs. So, as Don mentioned, Spanner does not have auto incrementing IDs and even if it did you probably wouldn't want to use them. The reason is that you want to distribute your writes over multiple splits and so it's much more favorable to use something with a lot of entropy in the first few bits.
So there's different ways to slice this. Our choice was v4 uuid and this is basically just randomly generated uuid. There's other uuid formats v1 that aren't favorable for this and the reason is that they can include a time component in the very beginning so you're not even redistributing your rights. So we don't store any meaning in the IDs. So we're just using randomly generated IDs. The other interesting thing in our case is that backwards compatibility with previous IDs is very important to us. So as I mentioned, Quizlet has native mobile clients. All these native mobile clients they depend on a certain version of the API. They access models with certain assumptions like they have integer IDs. And they actually cache data as well. So they're storing data and saving them. And so migrating to a new system means supporting those clients because they're mobile clients we can't just upgrade them all immediately, right? They're out there in the wild.
So we very explicitly left in our previous IDs and we have a secondary index in our schema for those IDs. This is a little bit– this is a diagram of what this looks like. We have multiple versions of our API. We've planned a new version to support a new ID scheme and we're going to be migrating mobile clients over to this as we go. One more thing I want to talk about and that's how we move data into Spanner. So, as I mentioned, uptime is very important to us. Stability is important to us. And so we can't just shut the site down for hours as we prepare the new database and copy data in. So we want to be able to transition over, flip a switch, and have it be done. So the way we think about this is replicating data by establishing writes to both systems. So we take our application, we're using MySQL, we're reading and writing for MySQL, MySQL is a source of truth. We're also duplicating writes to Spanner. So we send writes there as well. That means mutations to our models go to both systems.
While that's happening we then run a copy client in the background that copies data over into Spanner and eventually you end up with a very consistent data set. This is a little tricky because only one system can be the source of truth, right? Only one system can be the source of truth and so if a write fails for the other one you need to know that. You then have a data inconsistency. So for us this just means tracking those failed writes and ensuring that both systems are consistent. So the whole point of this is to transition over easily, so you begin writing in both systems, you do the copy in the background, you reach a point where they're similar, you stop writes, and then you can just flip over to the new system. This is our plan for switching over a table to Spanner. So that's all I've prepared. Thanks for listening. I hope this was useful. [MUSIC PLAYING]
As goes with most application databases, there are many tips and tricks for database schema and SQL queries that best optimize performance of your Cloud Spanner database. In this video, Dominic Preuss and Peter Bakkum share these tips and tricks at a high level so you can create your Spanner database schemas with optimal performance in mind.
Missed the conference? Watch all the talks here: https://goo.gl/c1Vs3h
Watch more talks about Infrastructure & Operations here: https://goo.gl/k2LOYG