Google Cloud NEXT '17 - News and Updates

Cloud Spanner 101: Google’s mission-critical relational database (Google Cloud Next ’17)

NEXT '17
Rate this post
(Video Transcript)
[MUSIC PLAYING] DOMINIC PREUSS: Good afternoon, everyone. Welcome to Google Cloud Next 2017. My name is Dominic Preuss and I'm here with Robert Kubis from our developer advocate team. And we're going to spend some time talking about Cloud Spanner. So what are the goals of this session? First of all, we want to take you through a brief history of Spanner and why Spanner exists. Then we're going to give you an explanation of what Spanner is today and what is available to you in the public beta. And then any good session has a demo and Robert will be taking you through a demo. The goal is to talk for about 30 minutes, do a 10 minute demo, and then hopefully have 20 minutes for people to ask questions at the end. So hold your questions at the end and then please come up to the mic and ask a question so we can make sure that we get your question on the recording. So why does Spanner exist? Let's go back to 2005. Every dollar that Google generated from its advertising system AdWords was on the back of a manually sharded MySQL database.

That database was sharded according to customer IDs and occasionally we'd have to reshard the database. We did a reshard and that reshard actually took multiple years to complete. It was at that point we decided that there's probably a better way. So we started development of Spanner. So as we were looking at the requirements for the AdWords system what were the things that Google needed? Google needed a horizontally scalable database. We had lots of customers. We were growing quickly. We had gone from a $10 million business to over a billion dollar a year business and we needed to support that scale. So we needed the horizontal scalability. The second thing is we were dealing with people's money. We had budgets. We had customers. We had partners. We had agencies. We had to have asset transactions with global consistency. We couldn't accept eventual consistency or any loss of data. So we had to make sure that it was always consistent and we always had asset transactions.

The third thing was every minute that the AdWord system was down for maintenance was millions of dollars of revenue for Google. And so we had to have a system that made sure that we didn't have any downtime. So these are the requirements. And if you look at this today, prior to Cloud Spanner entering the market there was no good solution that met all these requirements. That brings us to Cloud Spanner. [VIDEO PLAYBACK] – We demand a lot of our databases. And our databases demand a lot of us. They're fragile, expensive, and tough to maintain. When things break it can be hard to put the pieces back together at the speed of your business. And you often lose data in the process. We've all come to expect the distributed databases can't be globally consistent and scalable. But what if you didn't have to make trade-offs? What if you could have a fully managed database service that's consistent, scales horizontally, across data centers, and speaks SQL? This is Cloud Spanner, a mission critical relational database service built from the ground up and battle tested at Google for strong consistency and high availability at global scale.

At Google we use Spanner for services that billions of people access every day. It's the only Cloud service designed to be a massively distributed relational database with transactions and SQL semantics. And we're talking massive. Spanner can scale up to millions of machines across hundreds of data centers and trillions of database rows stretching across the globe while behaving with like it's all in one place, where you want it. We built Cloud Spanner to be fully managed and secure. Automatic sharding and synchronous replication with low latency and schema updates without downtime means your data is highly available and reliable. Instead of endless provisioning and worrying about maintenance you can focus on growing your business. No matter your needs, you don't have to hit a wall with your database service. Get started with Cloud Spanner. And build what's next. [END PLAYBACK] DOMINIC PREUSS: Great so that is our quick introduction into what is Cloud Spanner. So Cloud Spanner is Google's mission critical scalable relational database service.

So almost 10 years after the internal Spanner project went into productions at Google, we've been able to bring that same level of performance and consistency and scalability to our Google Cloud platform customers. So what does that really mean. So first of all Cloud Spanner is a fully managed database service. So what that means is you don't have to worry about upgrades. You don't have to worry about security patches. You don't have to worry about adding storage to your storage array for the next set of data that's going to come in your database system. All of that is handled for you by our site Reliability Engineering team who's making sure 24/7 that that service is up and running and that we're meeting your needs. Now we get fairly pedantic inside of Google around consistency. And so when we say asset transactions and we mean global consistency we mean externalize external serializability. So it's the highest standard of consistency. There's a lot of back and forth in the industry about what is eventual consistency, or what is global consistency.

We take a very high bar on this. And Spanner meets that need. So this means that you have asset transactions across many rows. You have relational semantics. So if you're coming off of a manually sharded MySQL database or a postgres cluster or some other SQL solution everything looks familiar. So your schemas are defined in DDL so standard SQL with SQL data types. You are able to do queries based on your joins and everything you're used to having in SQL. In fact, Cloud Spanner uses the same SQL dialect as what we've rolled out in BigQuery, which is based on Anti-SQL 2011 with some extensions for the additional features that Google offers. So with the standard SQL and it's the same SQL that you get across the Google Cloud platform on our data warehousing product BigQuery. It's automatic synchronous replication. So today in the beta, we've rolled out regional versions of Cloud Spanner. So what that means is we have multiple replicas across multiple zones inside of a single region.

And this gives you four nines of availability. It also means that when you write data into Spanner in the Cloud Spanner, you know that it has been written. If we've given you an acknowledgment that that transaction has committed, then you know that data is there in any reads that you do after that time stamp will include that data. We give these guarantees so that you always know that you're getting the most consistent view of your data at all times, or a consistent view of your data at all times. Later in the year we'll be rolling out multi-regional. So multi-regional means that you can actually span your database across multiple Google Cloud Platform regions to achieve an even higher level of availability. So multi-regional Cloud Spanner instances will offer five nines of availability. So what that means is you're talking about 20 seconds a month, I'm sorry two seconds a month of downtime with five nines. And this has been battle tested. And the reason that I can get up on stage and make these types of claims is because we know that we've done this internally.

So we've been doing this at Google for almost 10 years with the internal Spanner system. And so we know that when we make these guarantees of availability and consistency and durability, we can stand behind them, because we have almost 10 years of battle tested experience running it internally. Google Cloud Spanner has been used internally for many of our mission critical applications. So we've talked about Google AdWords, the Google Play Store, and in fact Google Cloud Platform itself is built on top of Cloud Spanner. So the control plane that manages all of the VMs and all of the services across Google Cloud platform is actually based on Spanner. So you can have confidence that when you put your data and build your application on the top of Cloud Spanner you're getting the best of what Google has to offer. So Dominic, you just said a bunch of words. What does that mean? How does that compare to other solutions? So if you compare it to a traditional relational database with traditional RDBMS, we have strong schemas.

We have SQL schemas and querying. We have strong consistency. We have high availability. Quite often I get asked, how does Cloud Spanner compare to the RDBMS that I'm using either on premise or in another public cloud? And invariably they're trying to compare a single node solution to Spanner, which is a distributed system. So when you have a single node master slave configuration, you're dependent on the master node. That means that you can only do as many rights as you can put through the master node. And if the master node needs to be updated, either get a new version of software, get a security patch, need to do any maintenance, you either have to take the master node down and take down the whole system, or failover your master node to your slaves you can do maintenance on the master. What this means is that your application will incur downtime. Now that downtime might be scheduled. But it's still downtime. For Google it doesn't matter. We're selling ads around the world.

We can't schedule downtime. We're making money and we have a system that is reliant on the system 24/7. So we can't have downtime, either scheduled or unscheduled. So Cloud Spanner, because of its distributed nature, means that we can do updates, security updates, any maintenance that we need to do in the system by moving your data around on the multiple nodes. And you don't see any downtime. In fact you see nothing on the performance side of what you experience. You'll see consistent experience even when we're doing things like upgrades and security patches. So in that sense, it's horizontally scalable. So the way that you achieve additional scale is you add nodes, which is what you would normally see a NoSQL database, which we'll talk about in a second. So instead of relying on vertical scalability, which means that if I need more throughput through my master slave, I just put it on a bigger box. I give it more RAM. I give it more memory. And so that's the vertical scaling strategy.

And so with Spanner, we give you horizontal scalability. So you don't worry about I'm on the biggest VM I really hope this provider makes a bigger VM because when I'm going to run out of size soon. Right, you know that we can always add additional capacity so that you can grow your application. Also replication, so traditional RDBMSs, one of the ways that you can scale out horizontally is you can shard your read and write workloads. So you can add additional read replicas to your system. And then you can do your reads off of those read replicas. Therefore giving additional capacity to your master node. Now the problem there is that those reads of the read replicas are stale. There is a replication lag for the data that's making that read replica. And so it's up to you and your application to know that you may or may not have the most consistent version of the data. If you need a consistent read you have to go back to the master node and therefore using up some those resources and take away some of your scalability.

So that's how it relates to a traditional RDBMS. How does it relate to NoSQL? So the pattern that we would see over and over and over again with our customers is, both our enterprise and our start-ups and smaller companies, is that they would build an application using a traditional RDBMS. It might be MySQL, might be postgres, might be name your commercial database. And then what happens is they just grew. And we all know that data is growing exponentially. It is growing faster– at a faster rate than its ever grown in our history. And so the requirements that we have in our databases continues to grow. And so often what happens is a company or project has this relational database that backs their application. And they hit the upper bound of what they can do for scale. So there are only immutable requirement is horizontal scalability. So they then go and choose a NoSQL solution, Cassandra, Maunganui, whatever it happens to be. And they basically trade global consistency for horizontal scalability.

Because that is their immutable requirement. And then they write a bunch of application logic. Or use other tools to try to get around the transactional semantics of having consistency only at a shard level, or not across the entire database. So what Spanner allows you to do is to not have to make this trade. Now if you've outgrown your RDBMS Cloud Spanner is not a drop in replacement. It's not a SQL compatible database. It's not a postgres compatible database. So there is going to be work for you required to migrate your traditional RDBMS application on the Cloud Spanner. But you get to know that when you make that transition you don't have to make a trade off between horizontal scalability and global consistency. I'm often asked, what was the hardest thing about taking the internal Spanner application and exposing it to our Google Cloud Platform customers. And the answer is making it open. So Google is committed to being the open Cloud. And so we've done everything we can to adopt open standards and existing protocols and APIs that exist in the outside world.

We aren't trying to move everyone on to Google proprietary technology. We're trying to move and solve customers' needs where they are, by supporting those standards. And so Cloud Spanners spent a tremendous amount of time trying to make sure that this was true. So we started with standard SQL. We've had a number of different SQL dialects across the Google Cloud Platform. And so we decided that we needed to come to a standard SQL dialect across all Google products. So we took Ansi-SQL 2011 as a spec and then extended that. And so what this means is when you look at SQL Spanner, SQL in Spanner you see normal SQL data types and the types of queries and joins and things that you would expect. So we make it as easy as possible for customers by reducing the friction in the impedance mismatch between their existing solution. The second thing is we spent a lot of time on enterprise requirements. Cloud Spanner actually launched with the best integration across all of our various cross-platform initiatives for identity and access management encryption and auto logging.

It came out of the gate in beta with all of the features that traditional enterprise that has high security needs would need to be able to adopt this product. And so we spent a lot of time investing in that, and making sure that the product had those features from the very beginning. We also spent a lot of time in client libraries. So we'd like to talk about being a very developer friendly platform. And being developer friendly means two things. I mean you've got great documents and great documentation. And you have really strong client libraries. So we spent a ton of time working with our early access customers. Cloud Spanner was in early access for over 12 months. And what we were doing in that time was onboarding customers, showing the documents, finding the holes in the places that we forgot to document things, adding those documents and really doing lots of iterations on the documentation, so that when we launched we actually had all of that ready for our developers. The second thing that was really important was our client libraries.

So we spent a ton of time on our Java, Python, Go and node.js client libraries to make sure that they were fully baked. They were optimized for performance. We obscured any of the common gotchas that you would have if you were trying to use the API directly in an idiomatic way for the language. So we spent a ton of time working with the various teams to make sure that we had a first class developer client library in these languages. The last thing is JDBC. In case you haven't heard this yet, the partner ecosystem is incredibly important to Google. And so one of the things that we did to make it as easy as possible for both customers to adopt and partners, is that we launched with a JDBC driver. The JDBC driver is read-only. And what that means is that we can hand that to a number of our partners or your internal customers. And they have a fully understood client library that they can often plug-in with very little work to their frameworks. And this allowed us to launch integrations with a number of partners early because we were able to deliver them a JDBC driver.

And that was able to decrease the time necessary for them to implement access in the Cloud Spanner. So what types of workloads are appropriate for Cloud Spanner? So as we were going through the early access program we saw a number of patterns emerge from the types of use cases that people found for Cloud Spanner. So the first one was transactional. If you have an application that requires transactional consistency or complex transactions, then that is an ideal use case for Spanner. So if you're dealing with things like inventory management where you can't sell more products than you have, or if you are trying to manage financial transactions you have to make sure that you always have the right amounts. And you have to have really strong transactional semantics to be able to support that. So what we saw is a number of the companies that we've worked with had a manually sharded system, or had built a transactional system on top of their manually sharded or eventually consistent Datastore to make sure that they were guaranteeing transactional consistency.

So what Spanner allows you to do is to push all of the transactional code and all of the responsibilities for maintaining transactional consistency into the database. So what this means is that you have much simpler architectures, much higher velocity for your developers, less code that can go wrong and introduce bugs, because you can rely on the database itself for maintaining and providing all of your transactional consistency. The second one was scaleout. And people often assume that unless they have tons of data they can't use Spanner. And the answer is if you have mission critical system that requires high availability then Spanner can be a great fit for you. If you have 10 gigabytes of data or you have a pentabyte of data, it doesn't matter. You have a mission critical application. And so Cloud Spanner gives you a Cloud native mission critical database that also provides scale. So if you have hit the upper bounds of what you can do and we continue to see other Cloud providers and database providers rolling out kind of incremental changes in the amount of data that they can store.

What Spanner does is allows you to have a non-incremental change. We go from being able to store specific single digit terabytes of data into as much data as you have. So if you happen to have a very large customer set or large number of transactions or a lot of data that you have to have transactional consistency, then Spanner allows you to meet that need without having to take any trade-offs. Another common use case is Global Data Plane. And this is very common inside of Google and actually the Google Cloud Platform use case. So if you have– let's say that you have user bases that are around the world. You have a set of users in North America, a set of users in Europe, and a set of users in Asia. And you want to keep all of the user data close to the customers then you can do that. You can have a regional instance or an RDBMS in the US, one in Europe and one in Asia. But what you need to know is have a single global system that tells you where that data lives. So for this user, user ABC, I know that data that user is homed in Europe.

So I'm going to go to the European instance to go ahead and get that data. And so these Global Control Plane where it might be a very small amount of data. But you have to be able to query it from everywhere, have low latency and know that you have a consistent view the data is a very common use case we saw both inside of Google and we're starting to see more commonly in the outside world. The last one is database consolidation. So everyone knows the story, right you keep your user data in MySQL or postgres. You keep all your event and logs data in a Cassandra cluster or in BigQuery. And then you ETL the data around. Right every night you copy a copy of your user table into BigQuery so you can do joins against it. What we're seeing with Cloud Spanner is it allows you to consolidate your databases into a single source of truth. So you could take your user data which that might be in postgres or MySQL, and you can merge it with all the events data that you have in Cassandra or Maunganui or BigQuery.

And you can combine them into a single database. You no longer have to ETL data around. You don't have to worry about staleness of data. You can do joins across all the data and you know that you have a single consistent source of truth for all of that data. Again it is a tremendous gain in productivity for your development teams if they don't have to worry about multiple data stores. If you can put all of your data in one place, and Cloud Spanner makes sense for that. It removes tremendous complexity from your stack. So that removes bugs. It makes it easier to maintain code, increases developer velocity. And this is again something we've seen a lot inside of Google. And we're starting to see with our Google Cloud platform customers. So how does Spanner do this? I say that it's horizontally scalable. So what does that mean? So for Cloud Spanner you start with a Cloud Spanner instance. And Robert is going to demo how to actually create one of these and you can see how easy it is.

But you create a Cloud Spanner, for instance, and when you create that instance you basically tell us what you want your replication topology to look like. So if you're using a regional Spanner instance, we have multiple replicas in multiple zones inside of a single region. And then each of those replicas has its own set of compute resources and storage. And then you put your database inside of that instance. So an instance can have 100 databases. And then each database can have 2,000 tables. And each of those tables can have as many rows as you want. And so what this means is that your data is actually replicated across each of the replicas. And we've got full copies of all the data in all the places. And so there is multiple replicas running at any given time. So what happens is if somebody comes in and wants to make an update. So let's say that you want to update a row in a database. Or let's say you want update 100 rows. That update comes into the system and goes into one of the replicas.

This is where Spanner magic starts to happen. That update comes into the system. We know that each of the different replicas has parts. They have all the data. But we know that certain parts of the data are owned by different nodes. So when that update comes in, we come in and we notify the appropriate nodes. And then that data gets replicated across all of the replicas. We use a custom Paxos algorithm that allows us to do consensus based writes on the system. And this is what allows us to have our global consistency. So we have a custom Paxos algorithm. And we have two-phase commits which allows us to have global consistency across the database at all times. We try to do everything without the two-phase commit. But if you actually are joining it, you're trying to update data across multiple nodes that we fall back to two-phased commit to make sure that we don't lose consistency of the database. This is all well and good and Spanner sounds magic. But there has to be some differences.

There can't just be a drop in replacement. And the answer is you're right. So to get the advantages of a distributed system we impose certain design constraints or we impose certain concepts. So the first one is the way that we leave out the data. So in a regular relational database if you had two sets of data that were related, you would have two tables and you may or may not have a foreign key constraint. Which would allow if you delete one row and it deletes all of the children rows. And so this is the way you would lay it out in a normal relational database. Postgres SQL does support child tables, which implement some of this but for different reasons. So what we've done in Spanner to make it performant and to help customers tell us what data should be grouped together we have the concept of interleaved tables. So what this means is that when you have tables that are dependent on each other, you literally interleave them together into the same table. So you can see our top level row is the Beatles, where the artist's ID is number 1.

And then we have albums that are under it that are from the artist and they have IDs 1 and 2. So we have a primary key for The Beatles, which is 1, and then 1 and 2, which are the primary keys for the child table. What this tells us is that all of the information that is under Beatles is related. And so what that allows us to do is to group all of the data together on a single node. So if you commonly are doing queries where you say, show me all of the songs for the Beatles. Or show me all the albums for the Beatles. And you know that your queries are going to be within a subset of the data by using interleaved tables you can tell us the logical grouping of your data. And this allows us to do a better job of splitting the data across multiple nodes, which allows us to respond to queries more quickly because we don't have to rely on two-phase commit. We can serve all the data off of that one section of the data in a very quick way. So what does this look like from a schema point of view?

So when you defined your schema, as we talked about we used DDLs standard SQL create table singer's, singer ID, is INT64 not null singer name as a string max length. And primary key is singer ID. So we need this primary key to know how to split your data up and shard your data across nodes. So that if albums is a subset of all the state and is dependent on the parent table, then we can say create table albums, singer ID INT64. So child tables have to have all of the primary keys of its parent. So it has to have singer ID. And then we've added album ID as a second primary key. And we also have an album name which is string. And then when we define the table we say interleave in singers. And so this is the concept of foreign key constraints that Spanner supports. It's not arbitrary foreign key constraints because that doesn't work on a distributed system. By telling us how to group all the data together then you can start to do things like cascade on delete and some of the other benefits that you get from a foreign key constraint.

But we only do this at a interleave table basis. The other thing that's great, that I love telling customers about, is schema migrations. So the other thing that this allows us to do is no downtime schema migrations. So you can actually alter your table and add a column, in this case age. We can apply that to the database. And that updates the schema on the fly so no downtime online. And you know that once that transaction– schema update is out of the transaction, once that commits, you know that all subsequent reads after that transaction have the new data. Again getting back to the problem we talked about with AdWords, we couldn't handle either locking the database or shutting down the database to do schema migrations. We needed to be able to do schema migrations on the fly. So alterations to the schema are online. So what does this look like in code? So in any of these sessions we try to give you actionable information. I'm a Python guy so I've gone ahead and pulled the Python code.

So if you use our Google Cloud client libraries. So we have– we've rolled open source client libraries in all the primary languages. So if I install the Python Google Cloud library into PIP, so PIP install Google-Cloud. I can then import Spanner from Google Cloud. I instantiate a client. So that client allows me to talk to the service. I then say I need my instance so please give me my instance ID. And then from the instance, I can grab my database. So I say what does my database ID? And then I ask my instance for the database. I now have a handle to my database where I can start to send queries. So I could do results sets by saying database.executeSQL and pass it SQL queries. And so these 10 lines of code are all that is necessary to be able to be up and running with your Cloud Spanner instance. We really tried to make it as easy as possible. There's lots of intermediate steps that are happening here that we've been able to obscure by having really strong client libraries that do a lot of the heavy lifting underneath the hood for you, things like connection pooling and things that could be gotchas if you aren't familiar with how to use the service.

So right before we get into the demo, how does this fit into the broader schema? So Cloud Spanner sounds fantastic. Is it the solution for everything? The answer is no. Cloud Spanner is one of a portfolio of solutions that allow you to build your application on top of Google Cloud platform. So starting with Cloud SQL, So if you have a framework whether it be Django or WordPress or Drupal that requires MySQL, then you can use Cloud SQL to manage that MySQL instance for you. So it's a traditional RDBMS that's being managed for you by Google. If you have a document database workload commonly in mobile and web, then Cloud Datastore provides you that data in a database. Fully scalable, it's shared resources, you can literally start from zero dollars a month all the way up to SnapScale, who just went public. So you can scale all the way up on Cloud Datastore if your data is non-relational and fits into a document database workload. Cloud Bigtable is our analytics database. So this allows you to do low latency single millisecond reads and writes and really fast table scans for your OIT time series Ad Tech data MarTech data it's a fantastic solution for that.

And then BigQuery is our data warehouse that allows you to write all your data in, mostly append, some mutation so you can update and clean up your data. But it's really our data warehouse. So what Cloud Spanner does in the portfolio is gives you a place that if you've gone to MySQL in Cloud SQL and you're using that as your solution. If you need to be able to scale past it you know that Google provides you an option. So it really gives you a future proof path to be able to move from your existing application on either MySQL or postgres or whatever you happen to be using and continue to grow out as a scaleout relational database. And with that I'm going to bring up Robert to get started on the demo. One other thing, one of the questions that I just went through, I use a lot of words. Flowcharts are usually helpful. So what we did here is we put together a decision tree that allows you to decide which database service you would use for which offering. So first of all, is your data structured or unstructured?

If your data is unstructured you're going to put in a Google Cloud Storage. That way you can use it in access it there. If you need mobile SDKs then Firebase Storage will provide you a mobile SDK on the Google Cloud Storage. If your data is structured then the question is your data analytics. Do you have an analytics workload? In which case you'll come over to the analytics branch. If you need low latency and lots of mutations use Cloud Bigtable, if you don't use cloud BigQuery. Coming back down the middle if your data is relational and the answer is no, then you have Cloud Datastore. And if you have mobile applications you have Firebase real time database that provides you mobile SDKs for iOS Java and the mobile applications. If your data is relational then the question is the new horizontal scalability? If you don't need horizontal scalability then Cloud SQL is a great solution for you and is going to be your easiest path to building an application. If you're doing horizontal scalability Cloud Spanner is there to meet your needs.

So this is a quick way for you to kind of navigate which the various database technologies will serve the need for your application. And lastly before we get to the demo, as I said before, partner integration is very important to us. It's why we spend a lot of time on the JDBC driver. So we're very excited to have xplenty, Zoomdata, looker and MicroStrategy and iCharts. They all are in the process of launching or have already launched support for Cloud Spanner. So some of your favorite BI tools are already integrated. And we're working with informatica, alooma, Tableau, they've started work and they'll be rolling out their solutions later in the year. So again, if you are already using Tableau or looker or one of these solutions in your stack we're going to make sure that we're integrated from the very beginning to make it as easy as possible to migrate data in and out of Cloud Spanner. So with that I give it to Robert. ROBERT KUBIS: All right, thank you Dominic.

Hello everyone, so Dominic presented to you a lot of features and arguments of why to use Spanner. So I want to show you how you can get started today with it. So your first stop should be Cloud.Google.com/Spanner, where find a wealth of information about our Spanner product. You find white papers, best practices, you find Eric Brewer's recap on the cap theory and Spanner. So there you also find your developer docs. And I want to stop here and asks the question, how many of you are developers here in the room? Awesome, so you're the right audience for me. So we, as Dominic mentioned, we have a couple of client libraries– Go, Java, Node.js, and Python. I'm going to concentrate on Go if I get to it today. So to get started within the instance you go to the cloud console. So console.cloud.google.com. Make sure that you are logged in. And then you have on the left side a nice drawer here you can type in Spanner. And you find it. And you can go there. So to create an instance, you click an instance.

You choose a name. awesomespanner, for instance. And then select regional configuration. Now you have the, as Dominic mentioned, the multi-regional is coming up later this year. Right now you can choose between three regions Asia, Europe and US. I'm going to choose Europe. I'm personally from Europe, so I prefer Europe. I want to have my Spanner instance to give me all the answers. So I select 42 nodes. And we create. And as you can see, with a snap of the finger you have your instance with 42 nodes. So now we can get started and create a database. So here I choose OK. I created demo database and I create an schema. So I select the table you can all do this also form the command line with gcloud– download to gcloud SDK and then you can do all this on the command line as well. So for instance, we have the singer. We had an ID. And then we say a name, which is a string here. OK done. And then you select your primary key. You add the table. And now you have your first table. Now we can add another table and that's as we mentioned before, I think we had albums here.

We can now interleave this as well. So I want to, for instance, interleave or not. So it can go through all this and create this. Let me choose this. I want to interleave the table with my single table. I want to delete on cascade delete. So I get already the ID as my primary key from the album. And then for instance I say it's like an album name here. All right, and you get a string. You're done. You have to choose a primary key as well. And we add the table here. Say album name. OK. Cool and we create. So that's basically done in background as we mentioned before, we have an online update. So you can see already it's done. Cool now I want to switch to a different instance where I have some preloaded data. So in this preloaded data I have a scenario where I have, think of Ticketmaster or something where you can buy tickets. And to explain that to you a little bit. I have a crappy schema here. Sorry about my drawing skills. So what I'm going to show you here, I want to show you some best practices, how we came up with this schema, and what you have to pay attention to if you want to scale horizontally with your database.

So here you have multi-event. Think of a tour, for instance, a Coldplay tour which is in many, many countries. So the ColdPlay tour would be a multi-event. Each of the concerts in each country is an event. Then you have different venues which have different seating configuration. Think of like a stadium which can have sports events or a concert. And then we have a ticket category which basically gives us attaches a price to a ticket for a certain event. So with the cold Coldplay concert in a stadium we have a certain ticket price for each seat. And then you see on the right upper corner an account. So that would basically somebody who comes to our website and signs up and wants to buy a ticket that would be representing that account. So now if this table pre-loaded here and what you can also see here in the console is you have a multitude of things that you can check into. So we see already here my database is mere seven gigabytes. Big. I can go to the monitor tab, see things like how many nodes do I have?

How much CPU utilization do I have? And we recommend not to go over 75%. So if you see that you're approaching this utilization you can easily add nodes. You see operations and throughputs and things like that. So this little spike that you see here was me loading the database. OK so I want to show you an example query that I have here, which is based on the schema that I just showed you. So the capability that you have as well here in the console you can directly run your queries out of the console. So I going to run this query. It should be rather quick. What does this query do? It basically gives you back the events for the next week where you still have a lot of seats available for this event. So you see here we have a couple of events with 10,000 available seats. And then the seats availability, basically, how many tickets didn't we that we have not sold so far. So this is for instance a query that you could use to promote events that are not going well in terms of selling your tickets.

Now we see, if I go to the explanation tab you get a query explained. And you can see what is this query doing. I see here a couple of table scans that I'm doing, disputed unions and things like that. And I looked at this query before and I saw OK this is not performing optimal. So what can I do if a query is not performing optimal? And as like with any like bigger database, relational database, that you have out there you have the possibility to give the optimizer some hints. And that's exactly what I going to do here. And so it was mentioned by Dominic that we have the Ansi 2011 scale standard with extensions. And one of these extensions is that I can force an index for instance. So I am an index to find on my seating category, and I say at and then the prices, and say force index seating category by ID. And I run this whole query again. And if you have paid attention you saw this query run before in 750 milliseconds. So fingers crossed this is better. Now you see here I did an an optimizer hint and now my query runs three time faster and scans and a lot less rows.

All right, so another thing what we have all of the Cloud products they are integrated with the products that we have. So one of the products for instance, for monitoring is our Stackdriver solution. So in Stackdriver what you can do is you can monitor, your Cloud Spanner instance. And I have here pre-created dashboard and I want to add some more– some more things. So right now I have nodes and storage, which I'm monitoring. If you want to add more things you basically can select from the charts. You have here a resource type Cloud Spanner. And then you can select the metric that you want to select. So for instance, I want to check this CPU utilization. OK and then you can select and say, OK, I only want to have that monitoring one of my databases. And the nice thing of this is you can also set up alerts with this. So if your Spanner instance is approaching a certain grade of utilization or storage then you can get alerts to your phone and can easily increase node count. So node count– adding or taking nodes away if you want to save money.

Or you like utilization is lower you can all the time go to your instance added nodes. And for instance go from 8 to 5. And then say Save. And then we will reduce the nodes. So it's a fully managed solution. Where it's very easy to set it up, get started, load data, scale in and scale out as you wish. So how are we doing on time? So one more thing. Dominic mentioned he is a Python guy. I'm a Go guy. So I'm usually doing Go. So how does it look in Go? It's very easy as well. So what you do is basically you have the database pass where you load or take the instance, or the project, the instance and the database that you want to talk to, you set up in a client. We have that here. And then you can basically run your queries. So what I'm doing here is a very simple query. I want to have all the multi-events that are happening for today. And because I don't care if I'm a little bit out of date with this, I actually can define time bounds for this query. So I can actually say to Spanner on say like hey, I don't want to have a strong consistent read in this case.

Because I don't think that we update the multi-events so often. So I'm OK if I do have a little bit of staleness. And in this case, what you see here, I'm saying like I'm OK with 15 seconds of staleness. And that allows you for instance, in the multi-regional set up to read from a local copy in a different region, even so if there's not the primary copy of it. All right so with that I want to lead into Q&A. And I want to ask Dominic to come back on stage. And we are here for you to answer any questions. DOMINIC PREUSS: And so we're going to do– we've got about 15 minutes for Q&A. So if you have questions please come up to the mic. Last thing is I want to put a quick plug-in for some of our other sessions. So if you're interested in Spanner and databases in general, we have the Spanner 201 session tomorrow. We have a great session on optimizing Cloud SQL performance. A good class again on talking about other storage solutions. So I didn't spend time on Google Cloud Storage or some of the other ones.

And also Zero to app, live coding an app with Firebase and GCP, which is one of my favorite sessions that I've seen. [MUSIC PLAYING]

 

Read the video

In this video, Dominic Preuss shares an overview of Google Cloud Spanner, Google’s mission-critical, relational and scalable application database, which is now publicly available as a managed service on Google Cloud. You’ll learn how Cloud Spanner evolved, how Google uses it internally, and how you can use it for your next project.

Missed the conference? Watch all the talks here: https://goo.gl/c1Vs3h
Watch more talks about Infrastructure & Operations here: https://goo.gl/k2LOYG


Comments to Cloud Spanner 101: Google’s mission-critical relational database (Google Cloud Next ’17)

  • Excellent talk.
    Would love to get a direct link to that decision tree @31:52.

    Ian Maddox March 13, 2017 9:23 pm Reply

Leave a Comment

Your email address will not be published. Required fields are marked *

Loading...
1Code.Blog - Your #1 Code Blog