Google Cloud NEXT '17 - News and Updates

How Descartes Labs uses PostgreSQL on GCP (Google Cloud Next ’17)

NEXT '17
Rate this post
(Video Transcript)
BRETT HESTERBERG: First off, thank you for walking over from Moscone, many of you. I'm very happy to be talking today about Google's newest managed database offering. Cloud SQL, as of today, is announcing support for Postgres, which has been one of our most requested enhancements. So it's nice to be able to answer yes to that question starting today. What we're going to talk about, of course, is Cloud SQL for Postgres. I'll do some focusing on certain features that are in the beta product. And then we'll also look ahead. Cloud SQL for Postgres is a beta product, which means we're just getting started. You're going to see us deliver new features during the beta program. And I'll talk about a few of those today. After that, we'll invite Tim Kelton, who is co-founder and cloud architect at Descartes Labs. Tim has been an early adopter of Cloud SQL for Postgres, and generally has used Postgres for many years to build some really stunning geospatial applications.

Tim will walk us through how Descartes is using Google Cloud to build these applications, and hopefully show off one as well. And then we'll get into some next steps and things that hopefully you can take away. Before we jump into all of that, I have a few questions for you. By show of hands, how many of you in the audience are developers? So we got– call it 2/3. How about database administrators? How about reluctant database administrators? There's a few more hands. And lastly, any managers in the audience? One brave soul, maybe two. Thank you. How many of you are using Postgres today, or use it frequently? That's about half the room. And how many of you in the audience– last question here for now– are Cloud SQL users? Not nearly as many. OK, great. Well, I think we're going to have a little something for everyone in the audience. And we'll jump right in. Let me start by saying that Cloud SQL is launching Postgres. And when I say that, if you're all on your laptops right now and you go to Cloud Console, click on Cloud SQL.

You won't see Postgres get in the Console. We're launching it as we speak, which means some of you, today and tomorrow, will begin to see that in your Console. Many of you, over this coming weekend, will see it in your Console. And all of you, by early next week, will see Postgres in your Console, and be able to create instances. That said, all of you today, with the CLI or an API, can create a Cloud SQL for Postgres instance right now. So, based on the response from the room, I don't have to go into the merits of Postgres. But I think many of you using Postgres right now see that it's a powerful open source relational database. Really, a vibrant open source community backing it, with many years of active development. And in addition to strong standards compliance, I think one of the most unique features about Postgres is the fact that it is extensible. And we'll be talking today specifically about one extension, PostGIS, which adds powerful object or geospatial object support to Postgres.

So where does Cloud SQL fit in Google's fairly broad and growing database portfolio? On this slide, rightmost, you see Google's data warehousing product, BigQuery– again, rightmost column. We store objects in Cloud Storage or GCS. We have non-relational NoSQL databases in Cloud Datastore and BigTable. Finally, you get to the relational space, where Cloud SQL lives along with Spanner. And if you haven't taken a look at Spanner yet, or been to some of the talks about Spanner going on at the conference, I would encourage you to do so. And then furthest over on the left, we have our in memory products in Memcache for App Engine. So you get kind of a feel for where Cloud SQL is, broadly, in the database portfolio. The thing I'll note about Cloud SQL is that Cloud SQL delivers standard MySQL and now Postgres databases. Which means if you have an application running with Postgres today, with few exceptions your application will run immediately with Cloud SQL. We're not offering Postgres compatibility layers, we're offering standard Postgres and standard MySQL.

The other thing I'll say about Cloud SQL is it is not a DBA. We do our best to offload the more mundane database administration tasks– installing databases, patching them, setting up backups– but we are not a DBA In the sense that Cloud SQL won't optimize queries, et cetera. As we look at running databases in Google Cloud, or on-prem, or elsewhere, I think the thing you'll hear me repeat in this presentation is Cloud SQL is a managed database service. It's important, I think, to distinguish what exactly that means. So let's do this by just taking a look at what it takes to run a database. And we'll start with the hardware layer. Folks in the audience who are running on-prem are thinking today about power and cooling. Even if you're not, even if you've gone to a co-lo or a hosting provider or a cloud vendor, I would encourage you to think about server maintenance. Here in Google Cloud, part of our server maintenance is live migration with Compute Engine.

Cloud SQL benefits from that feature. And if you aren't aware of it, there's been a few great blog posts put out by the Compute Engine team on live migration. Obviously, server maintenance, whether you are maintaining your servers or not, has major implications for your application stack. So then we look at the operating system on top of the servers. And this is where we reach a fork in the road. This is the first place where if I'm deciding, should I self-host a database on Google Cloud, for example, or go to a managed database products like Cloud SQL? We start to see some real differences. If you're self-managing a database, you're going to install your operating system, you're going to maintain it with patches, et cetera. Cloud SQL, you won't see the operating system. We maintain it for you. It's not even exposed. Of course, when you're self-managing a database, you're going to do the database install. You're going to patch it, keep it up to date.

You're going to set up database backups. In the Cloud SQL or the managed world, we're doing that, again, for you. We're turning on automatic backups if you'd like us to. We're pruning those as we go along. And we've also taken care of the software patches, et cetera. Lastly, we'll look at the advanced features, high availability. If you're self-managing, to get high availability, you are setting up replication between Postgres databases, for example. You've built tooling that monitors the health of these databases. And you are alerting so that you can manually failover in the case of some outage, or you've built additional scripts, et cetera, to failover automatically. In the managed database space, with Cloud SQL, Cloud SQL takes care of high availability for you in that it sets up the replication. We do the monitoring and actually trigger the failovers. The other thing here is, in the managed world, you are backed by Google's team. In the self-managed world, you are backing the database availability itself.

Last item on the slide is scaling. In a self-managed world, you decide when to scale up, when to scale out, and you implement those. In the managed world– want to make special note here. Cloud SQL helps you scale more easily. We help you scale up easily, help you scale out easily. But unlike other database products in our portfolio, Cloud SQL does not auto scale. So it's up to you still, as an administrator, to decide it's time to scale up, or it's time to scale out. One thing that we shouldn't forget is monitoring. And monitoring goes beyond just application-level kind of monitoring, Stackdriver for example. And this can be something like, I want to make sure that my slow query logs get to Pub/Sub or to Cloud Logging, for easy ingestion by third party apps like Splunk, for example. With a managed service, a lot of this plumbing is set up for you. Self-managed, it's not. Now, I'm the Cloud SQL product manager, so obviously I have some bias here in terms of managed databases.

My point here is not to say that one is necessarily better than the other. You all are making tradeoffs during your day. You're deciding, I'm going to spend more time in this piece of my technology stack, to make sure it is customized to exactly what I want. And as a trade-off, I'm going to spend more time managing that piece of the stack. I think the decision between self-managing databases and a managed database service like cloud SQL, is very much that. You will lose customization or flexibility when you go to the managed world, but you gain all these offloading of what we call mundane tasks. The self-managed world, you can tune Postgres to your heart's content, but you are managing that database, and managing its up time. So hopefully that gives you a good sense for how we break down the difference between self-managing and managing databases. And again, I'm happy to say that Cloud SQL now gives you the option to do this with Postgres. So let me highlight just a few features for you.

As a starting point, Cloud SQL for Postgres offers large instance sizes, up to 32 cores and more than 200 gigabytes of RAM. You can see in the screenshot here we are showing the fact that we're using custom VMs as the underlying mechanism to Cloud SQL, which means you get granularity with respect to number of CPU cores, size of RAM on your instance type. The other thing I'll note is, we have inexpensive development instances. Our smallest instances start at less than $10 a month. So you can get going with Postgres very affordably, for example. Next feature I'll talk about is storage. I think it's intuitive for us to scale a database in terms of CPUs and RAM. A little less intuitive is scaling performance in terms of storage. And the thing I'll note here is that at Google, we scale storage performance by scaling capacity. So if you add capacity, you get more performance. What you're seeing here, built into the Cloud SQL instance creation form that will be coming live to your console soon, is– we see the performance calculator on the bottom.

And what it's showing you is this is the performance level you can expect based on the number of CPU cores you've selected, and the size of your storage. In this case, you can see I am not maxing out the IOPS I could get out of my instance. If I were to add capacity, you would see that slider bar ratchet up all the way to, in this case, 25,000 IOPS. There's a small checkbox next to a one-liner in our Create form that I think is really important. And it's our auto storage increase feature. So when you check this box, you can see it enable the auto storage increase. That tells Google, you deal with my storage capacity. As I get near my 10-gigabyte limit, or 400-gigabyte limit in this case, increase the storage capacity for me automatically. And the nice thing about Google persistent storage is that we can increase the storage– whether capacity, performance, both– as a hot operation. You don't take your database down. So if you check that box, you leave the storage capacity management to us.

We will automatically grow your storage as you need it, while your database is running. Next feature is backups that are included with Cloud SQL for Postgres. You can enable automatic backups, which get taken daily. You tell us what time of day you want those backups taken. We automatically prune those. After seven days, we'll get rid of your oldest backup and take the newest. So you get a rolling seven-day automatic backup. You can also take on-demand backups. These are manual backups that you control yourself. You decide when to take them. And you decide how to retain them. If you want to retain one for a year, or three years, or seven years, you can do so. We keep them as long as you want. With respect to maintenance– I talked about maintenance in that decision between self-managing and managing database. With respect to maintenance, we offer a maintenance window. You tell us day of week and time of day, and we'll make sure that any maintenance we do is performed in that window.

You also tell us maintenance timing. So if you have two Postgres database instances in a given project, you might say, I would like my development instance to be maintained or updated before my production instance. And you can use the maintenance timing field here, which is earlier and later, to decide that my development instance should be updated before my production instance. And when you do that, you get yourself about a week of difference between these two things. So something on the earlier track will get updated one week before something on the later track, is our general guidance on maintenance timing. The last point I'll mention here is extensions. And I think part of Postgres' uniqueness and its power is in its extension ecosystem. We've included a handful of extensions at the beta launch. We expect this list to grow over time, and we'll look for your feedback. The extension we wanted to make sure we got into the product for launch is PostGIS. It's one of the most frequently used extensions for Postgres, and it's always used by folks like yourself who are interested in geospatial applications.

So what's next? I told you that this is a beta product and we're just getting started. It means you're going to see us land large features during the beta period. So what's not included today? Things around replication are not included today. So for example, we don't yet offer high availability, nor do we offer read scale out with read replicas. Those things are coming. We offer backups and recovery, but we don't yet offer point in time recovery. Again, that's another one that's coming. And from a connectivity point of view, today Cloud SQL for Postgres supports connections from Compute Engine, from App Engine, flexible environment, which just went GA, from Container Engine, or generally from any client that supports a standard Postgres connector. That said, we don't yet support connectivity from Cloud Functions, which was announced at this conference. Nor do we get support connectivity from App Engine Standard. So you've got a few things to look forward to as part of this beta period, and to keep in mind as you start to use the product.

So I have another set of questions for you. And I will ask you all, by show of hands– and we'll be on the honor system, you get one vote. If you're telling us what to do next, which of these five features is most important to you? So again, honor system, you get one vote. How many of you say, I want you, Google, to do high availability next? Got about– call it 1/3 of the room. All right, how about read replicas? This is read scale out? I'm counting Tim's vote as well. Another 1/3 of the room. Connectivity from App Engine Standard? One brave soul. Cloud Functions connectivity? A couple of folks. A couple of folks are excited about that one. And lastly, point in time recovery? I'm going to assume all of you are worried about recovery, but are just not voting it above these other features. OK, so it looks like the replication features were our hottest commodity. I appreciate that feedback. And if you do have questions as we go along, keep them in mind. Tim and I are going to leave some time at the end, and we've got mics so that we'll be able to answer your questions.

So, it's one thing to talk just about a product. I think where this gets exciting is how you put a product like this together with other technologies to build a great application. And I'm excited to say that Tim Kelton, who is the co-founder of Descartes Labs, is here with us today. Tim and Descartes Labs have been an early adopter of Cloud SQL for Postgres. They participated in a private alpha program. Tim and the team at Descartes have put together Postgres with PostGIS and a number of products in Google Cloud to build some outstanding geospatial applications. And I ask you all to join me in welcoming Tim. TIM KELTON: Thanks, Brett. Today, I'd like to share with you how some of the geospatial features that Brett just mentioned are helping us at Descartes Labs focus on the really hard problems we're trying to solve, and less on the day to day managing and scaling of infrastructure. So, what are those hard problems? Descartes Labs applies machine learning and AI to global satellite and aerial imagery to see how the Earth changes every day.

And our goal is to help customers quantify and predict how those changes will affect them. And our first products looked at things like, how much food are we producing in North America? And we built a platform on Google Cloud, initially to answer those questions such as agriculture. But we've since found as a range of applications in areas such as security and trading, but also in supply chain and logistics. So, Descartes Labs was founded in December of 2014. We are Silicon Valley backed, and we have Silicon Valley veteran leadership. And that's where the story changes. We're not the stereotypical Silicon Valley startup. Myself and the rest of the founding team were researchers, engineers, and scientists high up in the mountains of northern New Mexico at Los Alamos National Laboratory. And there we focused on areas such as deep learning, remote sensing, and large-scale high performance computing. And today we have nearly 30 employees, and relevant to this talk, we have no DBAs on staff.

So, how did we find Google Cloud? It's a little bit different story than maybe typically. As an early-stage startup, we kind of just go spend millions and millions of dollars buying satellite imagery, and the Earth Engine team at Google actually provided a public data set called– from NASA– called Landsat. And Landsat was launched in 1973, and has been continuously observing the Earth since then. And so that is available natively on Google Cloud Storage, in a Cloud Storage bucket. You can just simply GSUtil LS and see all of the imagery there. So we started, in the early months of the company, started pulling imagery down to VMs on Google Cloud, and building small-scale models over very limited geographical regions and over very small time windows. Trying to do something like forecast food production in the United States. It was quickly apparent that wouldn't work. So we needed to scale up our processing. Coming from places like Los Alamos, where you have high performance supercomputing and access to a lot of resources, you're not necessarily instantly confident that the Cloud will actually scale to the levels you need it to.

And very quickly, we knew we would need to use models that would cover large geographic regions such as the entire United States, and would see many, many years of crops growing. You can think, here in California, like two years ago, what was the story? It was all droughts and bleak water levels. And this year we have reservoirs dramatically overflowing. So to build accuracy in these models, we needed a large window. And so just a few months into the company, we were able to scale up on Google Cloud our code to process that entire petabyte Landsat and MODIS imagery to process all of that in just over 15 hours on 30,000 virtual cores. And that's the power of having this data natively on the Cloud right next to Compute. And, like I said, coming from places with high performance computing, what we are so pleasantly surprised with was not just that we were able to get lots of cores, but also that storage was able to scale with our needs and networking. All of those need to scale uniformly to actually scale up and process that amount of data really quickly.

So, once we had processed all of this data, we were able to build models over much larger geographic regions such as the whole United States, and see daily changes as new satellite imagery comes in every single day. In 2015, we had our first models. And when they got announced on Bloomberg, they actually moved the market a few percent. And we were all kind of surprised and excited. And then later in 2016, as most people that work in machine learning are constantly trying to improve the accuracy of the models, we improved our models. We also started working with customers on larger geographic regions around the Earth, and on adding multiple, more– multiple new crops. Additionally, we now actually release our models, what we're predicting as far as food supply, on a mobile app. So later in 2016, on Google Cloud, the Earth Engine team made a second dataset publicly available, again natively on Cloud Storage. So this is just there. You can go in your browser– on your command line and GSUtil LS this directory.

And that's from the European Space Agency. That's the Sentinel 2 data set. So this is petabytes of imagery that are actually stored in Google Cloud, and Google is paying the bill for hosting this imagery. And so you can quickly just start working on the actual problem you're trying to solve. Today, I want to share with you a little more information about the platform we built on top of Google Cloud. And I want to show you a product that we just launched on Tuesday called Geovisual Search that we've built on top of our platform. So, just a high-level diagram of what our platform consists of. As I already mentioned, we have various data sets from NASA and the US– NASA, European Space Agency, a number of other public data sets. And it's great when they just live in the cloud. But a lot of these we actually need to pull in. And the sensors on satellites that have been launched over basically a decade or more vary tremendously. They vary in the amount of light bands that they capture.

They vary in the sensor technology that's used. And our end goal is to make a really common imagery API that we can quickly have our machine learning teams have a nice Python API around. And no matter which set of imagery, or which layers of light, that it will be a really uniform experience. And so we have– so you can think of things more like a Python API or a Jupyter Notebook kind of similar to Datalab. So our processing– we start with ingesting all of this imagery. And we have to do things like co register the imagery for really accurate alignment. We have to convert it to standard projections. Things like removing clouds. Here in San Francisco, and I'm not talking about compute clouds, I'm talking about overhead clouds which in San Francisco happen quite a bit. Correct for shadows, adjust for camera angles, angles of the sun, calibrate the sensors against their neighbors, atmospheric reflectance. All of these things have to happen in our initial processing pipeline. And that happens on Compute Engine, using primarily managed instance groups.

And we heavily leverage the preemptable instances to cut down on costs in that case, because it's very embarrassingly parallel workload. So then, we have two directions from our processing pipeline where data goes. Our imagery, or what we call raster data, those are the pixels. That gets stored natively. I think, as Brett had a slide, that gets stored on Cloud Storage. We know that Cloud Storage we will not run out of space. We're expecting to have nearly 15 petabytes of imagery by the end of this year. We know that it will scale. We know that we can throw thousands of cores processing against it, and it will handle that. And it's nothing we have to manage. So all of our compressed imagery gets stored. And this is on the left side of the chart. And then, those services are available via microservices sitting on top of Container Engine, where we can query and pull down those images really quickly. The second part of our imaging processing pipeline is more of the vector data.

As we see each scene, we capture pieces of information about that scene and we use Pub/Sub to persist that information into a PostGIS database. These are things we need to spatially query. And so PostGIS enables us a mechanism to do that, and Cloud SQL supporting PostGIS starts giving us more of the freedom to not do things like install GOS and GDOL and Proj4J, and Lib to XML, just to get things into a geospatial database. And so those are more of what we consider vector data. Things like polygons and points, or given points of data that we might use to train our machine learning models. Say this given geographic region, what do we see the values at that given time? And then try and train the models to correlate with that. So we have many sources of data that we store in PostGIS. Things like public data from maybe the US Department of Agriculture. But also private data sets sometimes from customers as well. So then, both of those come together. We host many, many micro services. Again, on top of either managed instance groups or on top of GKE.

And then that's where we are able to do image searching, image rasters, location based searches, and build models on top of that. And finally, one last theme. In the last six months, I would say, we've started adopting Stackdriver. Stackdriver provides a nice, uniform way to do logging. And we're gradually using the alerting features more and more. And we can use that on things like preemptable instances, which come and go. Things like logging is quite challenging to figure out good solutions for. But we can also tie that into Cloud Pub/Sub, to Cloud SQL, and to GKE, and we have a really common way of querying logs to look for errors or latency problems, or just diagnose code. So what types of models do we actually produce? This is a little bit more information on the agricultural models I mentioned earlier. And we started doing this in 2015, just a few months into our company. How is this done traditionally? The US Department of Agriculture will send surveys. And they send thousands of surveys out to individual farmers a few times during the growing season to get progress on the actual growth of crops.

And then they aggregate the surveys back, and finally release a number and then the market moves. How do we do it? We have a quite different approach. We pull in about four quadrillion pixels. So you can think of billions and billions of iPhone photos. And we see every field, everywhere in the Earth, every single day. And those are inputs into our models to basically predict how much food will be produced at the end of the year. And so as I said, we've iterated on those models many times. And you can see in this graph, we can show that we not only are more accurate than the US Department of Agriculture, but we get that information much earlier. And we can see how things like a big weather event, like a hailstorm in Iowa– what was the actual effect of that? I should say, in 2017, our models will now be backtested over the last 13 years. So we can go back 12 years ago, or five years ago, and say, how did the model perform given this weather parameter? One really exciting advance is, we're now extending the use of these models to do things like try and detect early indications of a famine outbreak.

So this is something in North Africa, you can think of things like political instability that occur from food shortage, getting aid to places faster, and the amount of lives you can save. It's really exciting to be a part of that. So next, I'd like to show you what we just released on Tuesday, something called Geovisual Search. And Geovisual Search extends on top of the machine learning platform that I was showing you, being able to do geospatial queries over windows of time and pull together multiple different bands of light. Not just always human-visible bands, such as red, green, blue, but also things like infrared. And build models, and iterate on models, to see things that you could never actually humanly hire enough people to see. As part of that, we need training data. And we have many sources of training data, like I mentioned before. One of those that's open source is OpenStreetMaps. We can use a area such as OpenStreetMaps to train in a given geographic location. What are we actually seeing in these images?

And then we use TensorFlow to build models that, as we processed our imagery through there, we determine various signatures of each scene all across the globe. And then we try and see where else in trillions and trillions of pixels across the entire Earth, where else do we see similar items? So this is, again, we use Pub/Sub in this scenario to persist that data. But this time, we're persisting the model results out into BigTable. And then finally using Container Engine to serve the APIs and the front end interface. So, as part of this talk, I thought I would just quickly show you one problem I would have on traditionally using this approach. Again, things like OpenStreetMaps. I'm just trying to train a data set. It's a public open source data set. Sometimes it's accurate, sometimes it's not. Sometimes our models that we'll train out of that aren't going to produce that good of results, and we'll need to try a different approach. So the end goal is not to build a long production database always.

It's to get a geospatial data set in there so we can use it for training, and then move on to the next models. So, this is– I don't know if anyone here has had fun with OpenStreetMaps and using the tool OSM to PGSQL. It's an importer tool that lets you import all of OpenStreetMaps into a PostGIS compatible database. So, as you might be able to scroll down and see all these fun messages, after a few hours of importing I took a 50-gig compressed OpenStreetMaps file, and it's a 50-gig compressed XML file. So it's really hard to guess exactly how big that's going to be in PostGIS. And all of a sudden, I get to the spot where I have no disk space left on device and I've wasted my time. And I need to now make a bigger VM with a larger disk and try this again. And maybe I've guessed right the second time. But that's not that different than getting requirements. I don't know if any of you have asked for requirements early on in a project, and you've received input.

And either you never, ever, used all the space that was asked for, or you basically started way too small, and your project became wildly successful, and you needed to import all these other data sets into the database. So, one of the features Brett mentioned was the auto growth. So here's an example of– by clicking the Auto Growth, or selecting that in the API, we could start with a 10-gig disk and instantly grow. And we grew, over a few hours, to 380– it looks like 387 gigs. And the awesome part about this was I just am paying for the amount of storage I actually need at that given time. I'm not overestimating, I'm not underestimating, and I know it can grow. And we're really excited about things like read replicas and high availability as well. Especially applying that to micro services, which sometimes start really small, but then get really, really useful as you add features. So at this point, I would like to switch over to the demo. And it looks like I need to type my password.

And I would like to show you what we built on top of Cloud SQL, PostGIS, BigTable, Pub/Sub. This is San Francisco. This is using the national aerial imagery. We are storing all of this data. We are not attaching PD Disks to VMs. Instead, we're using managed instance groups which are natively reading Cloud Storage with a super fast layer. So this is all of the US, and this is all native Cloud Storage, reading to those imagery. So, as I mentioned before– and by the way, if you're really interested in how we did all of that, our CTO will be giving a talk this afternoon at 5:20, and that's IO242, and we'll talk all about the file system and Cloud Storage as a scalable file system. So in this example, I can look for a number of different signatures that we've trained TensorFlow models on. Some of the examples, we have a center pivot irrigation. Here's floating top oil storage tanks, orchards, marinas. And in this example, I'm going to look at wind turbines. And what we do is, we slice this entire layer.

This layer is specifically aerial imagery. And this is only of the United States. But then we have other layers at different resolutions and different bands of light, such as the Landsat I mentioned earlier. And also Planet, who will be giving the keynote– part of the keynote tomorrow. And we can search over all of those layers, we slice up into tiny squares and we write a signature. And this is all done through TensorFlow models. And then, if we click on this area, I can see all of the different places on this map that we've found a visually similar object. So here, I can scroll in and I can see, and it looks like we've found visually similar wind turbines. I can click over to somewhere in California, and again, I can find wind turbines. Pretty interesting use case is center pivot irrigation. And again, I'll look over all of the Earth. And you think, how many people? I could hire 10,000 people and have them just looking at scenes every day as they come in. They would probably commit suicide or something else, but in this case, we can easily use things like machine learning to scale up to levels that you really never could humanly.

So here, we've found more crop circles. And finally, just an example on Planet. Planet's a really neat data set. Planet is not launching hundreds of millions of dollar large satellites the way NASA would have traditionally launched satellites. They're instead using really small, commodity-based cube sats. And they'll see the whole Earth once a day at three meters for each pixel. And so with Planet, we have a number of things in this China dataset that we've trained on. And here's one example of solar farms in China. And I can look everywhere in China, and I can see different places in China that are solar farms. And here is all the other results that we found that are the top 500 most similar searches. So if I could switch back to the presentation. So, where we're going with this is things like eventually being able to see, in 2011 I didn't see a wind turbine here. And now as we've brought new imagery, we all the sudden see things like, oh, there's a wind turbine.

This is really neat type of capability. So as we're streaming in data from all of these different satellites, looking down at the Earth, we're constantly feeding those back into models and aggregating results that help our customers really answer hard problems. Just two other real brief examples of the types of scenarios our platform has to adjust for, and how there's not just one solution. In this example, with red, green, and blue imagery, it's quite hard to actually pick out where the wind turbines actually are. However, we can use, from the European Space Agency's Sentinel 1 satellite, there's something called synthetic aperture radar on the satellite. And we collect this globally, and– excuse me– synthetic aperture radar emits and detects radar signals. Those radar signals can pass through clouds. They can even work at night. And it gives you a measurement of almost like a height measurement on the ground. And you can see these vertical lines up and down, showing all of those wind turbines in that field.

More on the agriculture side. Here it's quite challenging to actually see the Ove River in Russia. You can see it's kind of dark, and there's definitely an area that you think the river and vegetation's there. However, if we switch over to red edge bands that are more like infrared, you can see the river and the vegetation is highlighted in really, really bright yellow. And then you can see water channels. So there's not one solution. Our platform needed to be able to handle a lot of different types of machine learning questions that we were asking of it. And not every problem needs the same bands of light or the same types of temporal cadences. And finally, I'd just like to wrap up. Where we're going at Descartes is moving on to using things, taking this platform that we've built for these internal use cases, and making this commercially available. So we have all of this imagery available that's been cleaned and co-registered and has a really nice API to do analysis across many different sources just with the basic Python API.

And we hope that showing you these types of things, you have ideas of scenarios I haven't shown you. And you would like to maybe build models on your own, without having to go through all of these hassles. So we're hoping to make that– right now, this is in private beta. But we're hoping to eventually open that up to the public later this year. So email us at hello@descartes if you're interested. And I think with that, I'll turn it back over to Brett. BRETT HESTERBERG: Really cool stuff. I loved the demo. It's truly finding needles in a haystack. OK, so let's talk about next steps and wrap up a little bit. As I mentioned at the top, we have beta availability of Postgres on Cloud SQL starting very soon. All users will have access early next week. Watch for more features coming. I got your feedback today. You want HA, you want replication. Watch out for big features coming over the beta period. A couple of points on our partner ecosystem. Cloud SQL has a partner ecosystem established.

We've added new partners for the Postgres launch today. You see a number of business intelligence software that have already been certified with Cloud SQL for Postgres. And if you're thinking to yourselves, I've got a data transformation problem, or I want to understand how I can get my data from where it is today into Cloud SQL for Postgres, the list of ETL partners is a great place to start. All of these partners are on the Cloud SQL partner ecosystem website. And they're ready to help you with Cloud SQL for Postgres. A few presentation picks. Tim mentioned IO242, which is happening later today. I encourage you all to attend that this afternoon. We're on day two of Next. You've got day three tomorrow. First pick on this slide is a bit self-serving. We've got another Cloud SQL presentation. This one is focused on optimizing performance and availability in our MySQL product. So if you want to go deep into database tuning, see what we do as Cloud SQL, see where you can get the configuration that we actually use in Cloud SQL to tune our performance, come to that talk tomorrow morning at 11:20.

I'm also excited about building high performance micro services with Kubernetes. And a look at BigTable, specifically low latency applications with respect to personalized content. BigTable, again, is our NoSQL database product. So a few presentation picks if you're still finalizing what you're going to go see this afternoon and into tomorrow. All of you should take a look at Cloud SQL for Postgres. By attending this conference, you received Cloud Credits. You can get additional credits by signing up for the free trial link that you see on the slide. Use those credits. Get familiar with Cloud SQL for Postgres. Give us feedback. And like Tim, hopefully offload a little bit of your work so you can build a great application as well. With that, thank you all for your time this afternoon.


Read the video

Explore PostgreSQL offerings on Google Cloud and learn how Descartes Labs uses PostgreSQL on GCP as part of their application stack to forecast production.

Missed the conference? Watch all the talks here:
Watch more talks about Infrastructure & Operations here:

Leave a Comment

Your email address will not be published. Required fields are marked *

1Code.Blog - Your #1 Code Blog