Google Cloud NEXT '17 - News and Updates

Accelerating Marketing Insight with Google Cloud (Google Cloud Next ’17)

NEXT '17
Rate this post
(Video Transcript)
[MUSIC PLAYING] MATT TAI: Welcome everyone. Thank you for coming. My name is Matt Tai. I'm a product manager with Google Cloud Platform. And today I'd like to talk to you about accelerating marketing insight with Google Cloud. So I do want to level set everyone, likely need to run your business with facts and not feelings. To do that, everyone knows you have data you need to get insights from, to ultimately make money. I mean, that's why we're all here. So as a business leader, what are some things I need to do? I want to get a single view of my business. I want to invest in my most promising opportunities. And I want to understand my customer. So wouldn't it be great if I can get a dashboard on my business, understand how all my different market segments were doing, how my advertisements are performing across all the different channels I was spending money across? Hello. Invest in the most promising opportunities. If you're in the advertising space, there's been an old joke.

I know 50% of my ad spend is wasted. I just don't know which 50%. So how do you identify which 50% is doing well, which one's not doing well? And finally, how do you understand your customer? How do you understand and profile your ideal customer? How do you improve the targeting for that. And so you want to find– sorry. I'm just going to skip this. So these are the things you want to do, but these are the challenges I face. So data lives in silos. If you're advertising with us on Google, you already know that, say, on AdWords or DoubleClick, DFP or YouTube, these are all individual silos you need to connect to and pull into one place to get visibility into your customer. Now that's just for marketing. What about your own first party data? What about your customers, your CRM data, purchasing, invoicing? How do you get that all into one place? It's really hard to get insight in your customer if you can't even get everything in a single place to look at it. And earlier I said, well you have to make a decision on identifying which 50% is performing poorly or which one is doing well.

Well in the marketing space, there's 3,000 tools for you to choose from. So even before you decide which 50% is good or bad, you need to make a decision as to which tools you want to use to help solve that problem. In terms of understanding your customer, we spoke with a large bank, asked them this question. That was their answer. How much data, how much customer data do I use in my advertising tools? None, I can't link the data sets. So, what do you want to spend your time on? Do you want to be a business owner or a data plumber? Do you want to spend your time running your business or moving all the data into one place? Really, it shouldn't be this hard. And this is how Cloud would like to help. Earlier today we announced the BigQuery Data Transfer Service. What it is is a data movement service from our largest advertising platforms into BigQuery on a scheduled managed basis. If you're a customer of AdWords, DoubleClick, Bid or Campaign Manager, DoubleClick for Publishers or YouTube, today you can have your analysts move data into BigQuery without having any development effort required.

Now BigQuery is just where you load all your data. That's great in isolation, but hey, what if you want to get your web traffic data in there? What if you want to get your mobile traffic in there? Well Google Analytics 360 customers also have direct connectivity to BigQuery. And they will export their data to BigQuery for you as well. And finally, if you're a CMO or a marketer, you need to do more than just Google properties. So how do you get data from Salesforce, Facebook, Twitter, Marketo, MailChimp, or Eloqua into BigQuery? We have an ecosystem of partners for you to work with to set up that automated data movement into BigQuery. So we've identified three steps to insight. The first step is collecting all the data in one place. With the BigQuery Data Transfer Service, you can get AdWords, DoubleClick for Publishers, DoubleClick Bid and Campaign Manager, and YouTube into BigQuery. With Google Analytics, they have direct connectivity into BigQuery as well. And for the non-Google data sources that aren't mentioned here, you can work with our partners such as Informatica, Segment, and Funnel to get your Salesforce data, Facebook, Marketo into BigQuery as well.

The second step is then figuring out how to make sense of it all. So once you get the data into BigQuery, it's pretty raw. For those of you who are advertising professionals, you know that you have to inject your custom IDs into, say, DoubleClick, shows up as other data. How do you extract that from those strings and make use of it? Also earlier today we announced a product called Cloud Dataprep, which is a visual ETL tool that lets you clean up that data, again, without having to do any programming. So your analyst can then use Cloud Dataprep to clean up that data, extract those custom user IDs, and dump that into BigQuery. If you have a data engineering team, we have Apache Beam via Cloud Dataflow. If you want to do something more sophisticated than just a simple extraction, you can use Dataflow to pull the raw data from BigQuery and then dump it back in there, cleansed and ready to go. And the third piece is the analysis and visualization piece. So what I want to show here is that BigQuery is not its own island.

You can of course use SQL to analyze the data in place. But we also support exports to Excel. You can also pull data from Google Sheets. And for the technical numbers of the audience, we do have JDBC and ODBC connectivity as well. On top of that, Dataprep does work with a lot of existing visualization partners listed here. Tableau, Looker, Google Data Studio, among many others, work with BigQuery out of the box. We're doing all of this to accelerate your time to insight. So next I'd like to hand the mic over to a few customers that we've invited over. And we'll have Mercedes-Benz USA come up, as well as their agency, SapientRazorfish. Then we'll transition over to Zenith. And we will have a fireside chat after that with "The New York Times" and Hearst. MATTHEW GRANISH: Thank you Matt. Good afternoon. I'm Matthew Granish from Mercedes-Benz USA. And it's a real pleasure and privilege to be speaking with all of you today. I think most of you are a lot smarter than I am.

And I'm the business guy. I'm not the programmer. So it's been really cool to see the breakout sessions and see how the mechanics behind all the questions that I ask of our data scientists and engineers, what actually makes that possible. So it's really cool to be here. So I'll talk about what we're doing. So our data, as you can imagine like most clients, we're getting data from all kinds of different places– from our websites, from our media, from our dealerships, from after-sales. I mean, it just goes on and on. So like Matt was saying, most of it is in silos. So we're trying to bring all that together. So what we've noticed is a lot of the companies that are really doing well are working on kind of like a data as a platform space. So in order to do that, our vision is to try to be at the intersection of, not only are we superior technology-wise, but we can be flexible. We can be agile. We can be creative. We can let the data power the decisions that need to be made to determine what marketing strategies we use instead of just kind of sticking our thumb up in the air and hoping that we're targeting the right people and using the right message, and then connecting it from all the departments within the M-B USA.

So is what we're doing helping our vans department? Is what we're doing helping our after-sales department, et cetera, et cetera. And then so some growth opportunities for us by having this connected data– I mean, it's really about being a customer-centric company. Matt mentioned having a complete view of your business. So for us it's really about having a complete view of your customer. So it's getting away from that aggregate data at the broad level, demo level, to what is Matthew Granish doing? Because the way I buy a car is probably going to be different than the way the next guy buys a car. And so we have KPI buckets and intuition that tells us that you start here and you do this and you end up there. But in reality, everybody does it differently. So we really need to use these things to generate internal efficiencies so we can track all of this. And so this is just kind of reinforcing that, putting the customer first. And so this is kind of like the path that you could say where most companies are probably progressing.

So we start off with a very simple, basic, retroactive look at what we did. And then we're trying to build the systems through the interconnectivity that we use the cloud for to get those real time optimizations and to identify what are the things that the people who are converting doing, and how is that different than the people who are not converting so we can target the converters more efficiently? So as we get towards the highest bar, the customer centricity, obviously there's a lot of data that has to be processed, and very quickly, to make all that happen. So you'll hear from our data scientists on why the Google platform works so well. For me, it just works really well because it's fast. So we don't have a petabyte of data yet. But if you saw that example at 12:00, they analyzed a petabyte of data in two minutes or something, almost three minutes. So if they can do that in three minutes, then ours is going to be a walk in the park. So that's why I'm really happy is they're able to answer the questions that I have very quickly and I don't have to wait days to get the information that I need to inform marketing choices.

So just some reasons why GCP works– it's quick, it costs less, and there's not a big implementation time. Most of our systems are already using a Google product of some sort, so the integration was very easy for us. And now I'll bring up Evan Rowe, the science director, to get more into the nitty-gritty. [APPLAUSE] EVAN ROWE: Thank you. So Matt talked a little bit about creating this integrated view of the customer and how that's going to be a big part of driving engagement with the brand. And we see that across the board for marketing services– so across the digital properties, across CRM and direct marketing, across media, and then really across the board for things like understanding lifetime value and even driving accountability through ROI analyzes. And of course this isn't always that simple, right? There are a lot of challenges that we're going to face to make this vision come to life. And one of the biggest challenges is, there's a really, really big ecosystem of data here.

So we have all this social and media data, which as you know, it's massive. We also have all of these digital properties, all of these apps. We have CRM and loyalty systems. And then we have, as Matt mentioned, this sort of distributed dealer network that all comes with its own data. And of course, all of these have their own platforms and systems and tools for us to kind of wrestle with to bring this together. And coordinating all of that is, I think, Pretty, much a universal challenge. I mean obviously the automotive industry has its eccentricities. But of course I think every marketer really sees this landscape and they say, how can we create a single view of the customer? And to talk a little bit more about the actual scale here– so while I think that Mercedes– it should be easy to make this choice. It's a superior product and it should be an easy decision. But the car buying experience, it's a long consideration cycle, right? It's a big decision. There's a lot of planning and research that goes into making that decision.

And so over these sort of months long decision process, we have so many interactions and touch points. So think about the millions of customers, all the platforms we were just talking about, and look at that over three to six months while someone is really making this decision. We're talking about billions upon billions of individual interactions. And we can't really make sense of those without using statistical models. And so one of the things that we do is we have to use models to kind of systematically help us determine who the right people are to target, what decisions we should be making. And so that comes with its own challenge because then it takes time and resources to develop, build, test the models. And that process doesn't always lend itself to your more real time use cases like serving digital media. And on top of this, so there's this initial Google Cloud for Marketers rollout that happened last fall. And they talked about some of the benefits and some of what they envision for this platform.

And it talked about collecting, analyzing, visualizing, and iterating on data. And I think that's a very, very good way to position the value of this for marketers. But for the future marketing collateral for Cloud for Marketers, we're adding one more. We're going to add activate, because we also see this as an activation platform. And if you're going to create an ecosystem where you integrate all of your customer data across the board, I think there are a lot of additional use cases that you can activate outside of just the sort of time to insight and driving efficiencies in the analytics process. And so just sort of an overview of our challenges and how we're addressing them with the cloud. As far as data collection goes, we have all these platforms. But actually a lot of them, as Matt mentioned, are already on Google platforms. And so we were really able to speed up the data collection process by taking advantage of that. As far as analysis goes, we were kind of building an EDW within BigQuery, which as we know is very fast at produced the querying time.

And also, we don't have any operational concerns anymore. As far as visualization goes, a lot of our use case is actually– we were using Google Data Studio actually as a dashboarding tool, which actually it's met most of our needs so far. And obviously we can integrate with additional platforms if we need to. And as far as iteration goes, all of that time we were talking about that goes into developing models, we're able to automate them now in the cloud. And that's really speeding up that process. And then as far as activation goes, we also have all of these use cases around personalization for the site and for CRM, and then also things like audience management for media, which we see as a really, really big opportunity here. So we really see a lot of extensibility to this platform and a lot of potential use cases that we can apply it to. And with that I'm going to hand it off to Jeremy, who's going to talk through our actual architecture and our experience so far working with Google Cloud.

Thanks. [APPLAUSE] JEREMY TERUYA: Hello. I'm Jeremy. I'm associate director here at SapientRazorfish. So I'm going to walk through some of the actual architecture of how this was accomplished. So this is essentially our basic architecture right now. As Evan mentioned, there's a lot of disparate data sources that we collect. So these are just a few examples right now– DoubleClick, Google Analytics 360, YouTube, Kenshoo with paid search, even some stuff from MBIT for CRM for after-sales, dealership relationships. So kind of the basic walkthrough. And I should probably put icons there labeling what each of these are. So if you're unfamiliar with the actual Google products, this may not be that beneficial. But from reading left to right, it's essentially we do use a combination of Dataflow and the connectors or Data Transfer Service that Matt Tai mentioned was announced earlier today. We load, during the daily ETL process, Google Cloud Storage backups in case anything goes awry.

That's going down. And then reading to the right, BigQuery obviously is our enterprise data warehouse solution. Visualization through Google Data Studio and a mix of Tableau, just depending on what our individual stakeholders need. And then moving more toward the automation and activation portion now, Evan had mentioned that we use a lot of more modeling purposes and statistical means to make sense of all of this information. So we've been playing around with Data Lab over the course over the last three to six months. But a lot of our major development work is done on Jupyter as a platform. And we use Dataproc as a way to automate a lot of the modeling code. So that's on the far right. So a lot of the modeling that's iterated on is either reingested back into BigQuery as a way to validate and also to monitor as a performance mechanism, but also we use specific and custom integrations with different platforms to push information out. So the activation portion that Evan was talking about is a really, really important component to our success.

So this entire solution architecture right now has officially been registered as a partner solution within Google. So this is essentially how we've laid it out for Mercedes. And so kind of the reasons why we went through this whole migration process– so we were on AWS before. I don't really need to spend a whole lot of time. I'm sure everyone in here has seen all the demonstrations over the course of the last day as well as potentially in the past. BigQuery is very fast. This is not as impressive as the petabyte example that they had shown at the 12 o'clock session, but essentially what we found is the speed is really not determining how many questions we ask anymore. I could go run a query across gigabytes and gigabytes of data or terabytes of data and maybe go get a coffee or go to a meeting and when I come back, it may or may not be done. I don't really have to worry about that anymore. I can kind of iterate on my questions and all of the questions from Mercedes and get them back in a timely manner.

Also mitigating the loss of productivity due to human error– so a lot of times as our analysts are running queries against other solutions– you write your code. You execute it. You wait a good amount of time and it comes out wrong. I can mitigate that loss because it's very, very fast and I don't have to worry about, oh crap, I just messed up and wasted two hours. So this is essentially a basis. So we know BigQuery was fast and we know it was a great architecture, but we already had the system set up in AWS. So really, was the juice worth the squeeze in terms of doing this whole migration process? One of the big hangups for us was we've architected a lot of custom integrations into Redshift. Do we want to repeat that again? So with the connectors or Data Transfer Service right now, as Matt mentioned, Mercedes is a brand that has a lot of products on Google. So leveraging the connectors was a very, very simple solution. So this is kind of like a one week process for– I believe this was DoubleClick at the time.

So for engineers, this may or may not be very similar to the processes that you guys have architected in the past of getting that data transfer files into a suitable SQL warehouse. But essentially the gist of it is, we've gone from debugging Spark code and making sure that the scripts are automating correctly on an ETL process on a daily fashion to, can you or can you not fill out a form? It's very, very simple to set up the data transfer process. You literally open it up. You type in your network IDs. You set a cadence, and then it just goes. And it's very fast. So we have enterprise data warehouse. It's very, very fast. BigQuery, the connectors sped up the infrastructure setup as well. So essentially we reduce our delivery time for performance reporting or analysis from any questions that Mercedes had by around 60%. So we clear the way for getting a lot of the base questions down that they have, and as well as increasing sophistication of the questions that they would like to ask because now we have a lot more time.

And this is like a little quote from the last analytics trend report from Deloitte last year. "So big data and traditional analytics are emerging. So it's really, really getting harder to distinguish the two because the asks are getting more sophisticated." So setting up a platform like this has allowed us to scale our ability to answer those questions in a very, very timely manner. Increasing the time for exploratory analysis for Mercedes, as well as has added benefit for us at SapientRazorfish is, essentially, you now have this great architecture. Potentially there are other great analytic storytellers or great data analysts or traditional business analysts who may want to get their hands on this and sort of do a cross-train against your department. For us at least it's been a very, very great benefit. A lot of people have sort of flocked over to us and sort of, we do little training sessions here and there. So net-net, cost times scale all have improved over the course of our migration.

80% reduction in startup costs, not just time and money. BigQuery's very fast, answer more questions that you need answered, increase your allocation to generate insight as well as scale. So scale was the biggest thing for us. As the questions become more and more complex, we need bigger horsepower. So while sometimes we just use a general four core whatever BigQuery has, we sometimes scale it up in Dataproc as needed. And with that, I'll hand it back to Matt. [APPLAUSE] MATT TAI: Next up I'd like to introduce Ian Liddicoat from Zenith. IAN LIDDICOAT: Thank you sit. Good afternoon everyone. So Ian, head of data and technology and analytics at Zenith. I'm a data scientist and I'm still taking the drugs. Just to orientate our organization, Zenith is part of Publicis, the marketing services company, second largest in the world. You just heard from Razorfish, which is part of Sapient. So we are brothers and sisters you might say. You'll see in the next slide that digital and the use of technology is having a huge impact on our business.

And it's really for this reason that we see Google, and particularly the cloud, as a very significant partner to us going forward. And you can see the scale of the evolution in our business between 2009 and 2015. And it's definitely connected consumers, driving our clients, driving our clients for increased personalization, that then in turn drives our business. And I don't see that trend changing at all. So Google is a very important partner to us. And I see them as– I distinguish Google, in particular, as a partner rather than a software supplier. And we are heavy users of their technology, the cloud included. And there's probably many people in this audience that would have traditionally been AWS users. And we are no different. But we didn't really have any interest in just switching cloud platforms if there wasn't a long term benefit. And I certainly see Google as the long term partner, particular in the use case I'm going to talk to you about today. Particularly as a media agency, we are heavy users of the DoubleClick platform, and have spent many years encouraging many of our biggest global clients to switch to an integrated Google ad tech stack, for a lot of reasons.

And it's enabled us to do some of the things I'm going to talk about in a second. So what's our particular use case for the cloud? Well it actually involves artificial intelligence, which you might not expect to see from a traditional media agency. But it's because we started to reimagine what the digital media agency is going to be in the future. Now what we're doing here is taking very, very granular cookie level data from the ad server, fusing that with first party data from all our clients, and then applying a set of machine learning algorithms to all of that data to determine what is it that's driving the conversion, or whatever the given KPI might be. But then in a world first, we've taken that machine learning result and we've pushed it back to DoubleClick. Now that's never been done. Now the reason we're doing that is not because it's clever and it's technically difficult to achieve, which it is. It's because we really want to think about the automated digital media agency of the future that's closed loop and highly automated.

And that's exactly what you're seeing here. And this process started nearly two years ago with a Aviva as the founding client. And just to give you an example, Aviva gave us access to their entire CRM data. And we are only the second organization in the world to have access to their price elasticity data. And that means we know the connection, the relationship between the digital traffic for an individual, their relative price elasticity, and anything else Aviva knows about the individual. So the application data, for example, that when you sign on for an insurance quote, we fuse that data together, applied a machine learning technique, and then pushed that result to DoubleClick. We're steadily rolling this out for a number of our global clients, which means we need a cloud partner that will scale with us in this way. In this particular example, the volumes of data aren't particularly huge. We're talking about between 50 and 100 gigabytes a day. So it's not enormous.

But the point is this is for every single client. This is not one use case for one client. This is all our major global clients. So as we move forward, that score that we're generating through machine learning is now a very rich picture of what we know about that cookie. It's what we call a smart ID. So that means that the factors that we're passing back to DoubleClick are a very rich picture of what role has content, for example, played in the conversion. What role has an individual campaign played in its conversion? And we're starting to unpick content, for example, so assigning numerical values to text, video, and graphics. Because we want to understand, what is it that's really driven the conversion? Now I'm slightly heretical in my organization, and I don't believe that media always drives the conversion. It absolutely doesn't. There are many other factors that drive a conversion. And what we're starting to unpick here is what is the relationship between some of those other factors.

And this is the primary use case that we've applied Google Cloud. In terms of performance, just to give you some idea of scale in that Aviva example, this is transformational stuff. We identified 83,000 additional quotes in our initial exercise in the first three months. And if you were to sum this up over the course of the relationship with Aviva just on this single project, a 15% reduction in CPA, which for them is very significant. And they're the first insurance organization to even attempt this. So just in terms of specifically wrapping up on the benefits of GCP for us, we've seen a 30% reduction in the client setup process– which for us is fairly significant, because use of talent is a challenge– a 20% reduction in hardware resource for machine learning solutions, and a 20% reduction in computation costs. Which for us, given the scale of what we're trying to do across something like 150 global clients, this is really quite significant. But probably the important thing is, just to close, is I say Google as the partner, not a software supplier.

[APPLAUSE] MATT TAI: Thank you. Next up I would like to introduce Damian Lawlor, who will help rap us up. DAMIAN LAWLOR: Thank you Matt. That's machine learning in action. So you probably can guess from my accent that I'm Irish. And I actually love collecting pithy quotes. So to share, I loved that "was the juice worth the squeeze." I'm delighted to hear that it actually was as well. But I'm going to remember that one. So I'm really just going to run very quickly through what Matt said. This is all about accelerating your time to insight. We recognize that time and the speed of response is becoming more and more important in terms of being able to use your customer data or your customers' customers' data. Every single piece of data you can get, the faster you can combine all of those data silos together and drive insight, then the faster you're going to differentiate yourself and really drive long term sustainable competitive advantage. Our goal is to help you along that path.

That's why we announced today the BigQuery Data Transfer Service. There are some great stories about how people have actually been able to use that to assist, to make life easier, to be able to take advantage of us building these connectors for you rather than you having to do this for yourselves. And then combine that, we recognize, we actually want to keep working at this and adding more and more connectivity to more and more data sources, making it as easy as possible to break down that huge problem we all have of data silos. Because if we can get all that data together, we can unlock very, very rich value. And then Matt walked through the three steps in terms of getting insight– being able to use Cloud Dataprep, Cloud Dataflow, and to get clean data into BigQuery. We know that this is a really, really big investment step for many, many people right after the ability to collect as many data sources as possible– and then analyze and visualize. That's not where we want to stop.

This is really just the beginning of what we're going to be announcing over the next months, because what we want to continually to do is bring out solutions that make things as easy as possible for you to engage with cloud computing and really get marketing analytics in the cloud working for you. So over the next couple of months, you'll see us bringing out packages of solutions that will enable you to consume cloud in as simple a way as possible, so a package that will include BigQuery Compute, BigQuery Storage, Dataprep, and various other parts of the cloud platform in a way that will allow you and your teams to start really being able to expand the amount of querying you can do exponentially from where you might be now. Because our goal is that if we can give you, for example, a fixed price for unlimited querying on a monthly basis or on an annual basis, we believe that will really unlock the creativity of your teams. So I think please do come and talk to us about the BigQuery Data Transfer Service, which was announced today, and Dataprep.

And in the coming weeks you will hear more and more from us in terms of other solutions that we're bringing you that we'd love you to directly engage with us, and we can talk it through. And in fact we'd like to show you it because showing is much more powerful than telling. I think anyone who was at the 12 o'clock presentation and saw how fast that petabyte got queried, that's more impressive than me telling me about it. What I'd like to do now is to get another two of our partners to come up and share some more stories and just talk us through what their experience has been to date in working with cloud, and in specifically in working with Google Cloud. So an Allan Beaufour, Senior Vice President of Engineering from "The New York Times," and Peter Jaffe, the Data Lead from Hearst, are going to join me for a fireside chat. But it's California, so it's without the fire. So we're all got microphones. I'd like to start by saying, we're at Next, so a lot of today was based around what's coming in the future.

But if you don't mind, I'd really love if both of you would actually talk us through your route into the cloud. What were you hoping to achieve in that route, in that road into cloud computing? And then the follow up question is going to be, where do you think you are against achieving those original goals yet? Maybe Allan, if you want to start. ALLAN BEAUFOUR: Sire. What do we hope to do– a lot of different things. We had a mix of AWS and a lot of not very well maintained data stores, which probably is common among a lot of us. We had Redshift. We have on-premise Hadoop. I know who had that idea. We wanted to collect a lot of that, put it in one, you know, this data lake concept, put it online, make all data join away. Where we are now, we've killed Redshift. We've shut down Hadoop. We have a fair amount of data in BigQuery. It's up and running. We're Dataflow. We're using Pub/Sub. And BigQuery is driving a lot of our analysis, not all of it yet. We still have data sets to be moved.

One of the things, we're not in the business of running infrastructure. We were to some extent. I want to get away from that. We want to publish content, write content, distribute it. So some of the things that drove us to Google Cloud in particular was the no ops attributes of BigQuery. Redshift needs some ops. We didn't do that very well. BigQuery needs no ops. So far, it's working out really, really well for us. And we don't need any people thinking about operations, which is amazing. DAMIAN LAWLOR: Peter? PETER JAFFE: So Hearst in general, enterprise technology a few years ago decided, as a general initiative, to try to move as much as they could into the cloud and away from on-premise. And they'd been doing that because of the efficiencies and the ease of use and how easy it is to add resources and things like that. And in general they've been very successful with that. Our specific group, the data engineering and data science group, was only born four years ago.

And so we've been entirely on cloud since the beginning. And that was initially all AWS. And similar to Allan's story, we've been migrating over to Google Cloud over the last year initially because of BigQuery and how fast it is and how cost effective for our use case compared to Redshift. We have a bunch of Redshift clusters. We used to have more. And they're really big and they're really expensive. And it's been very successful for us to be moving over to BigQuery where we can have a lot more data available for immediate querying at a much lower cost compared to what it would be to have all of that stuff in Redshift. DAMIAN LAWLOR: In terms of the move to cloud, is starting with the data and analytics platform on cloud, is that a good place to start for somebody who is maybe thinking about the journey to cloud? PETER JAFFE: Yes. [LAUGHTER] I mean, it's been some years since I did analytics on data sets that were on-premise. And it's just very liberating, as the person who's interacting with the data, to have it on the cloud, because it's so easy to spin up additional resources, to move data around, to get data into the database where you're going to perform your analysis.

It's so easy to put Tableau or Looker on top of that. I can't imagine, at this point, doing it any other way. ALLAN BEAUFOUR: Data is really where you need elasticity and growth. We're not going to get less data in the world, right? It's only going to come more. It's going to be more and more prevalent. And whatever business function you're running, there will be more data. And doing that on-premise, you'll be provisioning for now and forever. So yeah, I would say it's a good place to start. You need it. DAMIAN LAWLOR: And I remember you telling a story in New York about how actually it was the data analytics that spurred some thoughts about other workloads then moving over to the cloud. Can you tell us a bit more about that journey? ALLAN BEAUFOUR: Yeah. So before BigQuery, or rather I would say BigQuery has really transformed how a lot of analysts do their work, because they do the same work but BigQuery has enabled them to kind of stay in this inquisitive, asking the data questions mindset.

Whereas before, they kind of came up with a theory, did a query in either Redshift or Hive, and then went to lunch, or went to a meeting, and then came back and oh, what if I ask this question instead or this and this and this? Right now it's a much more interactive session with BigQuery, which has really unleashed a lot of analytics and thinking and creativity, where they can query the data, see something, come back, create another way, add more filters. It's a much more fluid workload instead of context switching all the time. You do this, go to lunch, come back. Oh, what was I doing? So it's really opened up so many more things for us to have a much more interactive session with the data. DAMIAN LAWLOR: Just a quick question to the audience. How many in the audience are already doing data analytics in the cloud? OK, so then I think this next question is relevant. So for those that aren't, what are the pitfalls that you think people should think through as they start this journey?

Peter, if you want to– PETER JAFFE: Pitfalls to moving your data into the cloud we're talking about– yeah. Well, gee, that's a tough one. I mean, there are so many advantages to moving to the cloud. I think that if you've got a lot of infrastructure on top of your on-premise data, you've got a lot of stuff to migrate, getting buy-in, getting people on board, figuring out what tools you're going to use. My experience with data sets that were all on-premise was that the tools on top of them were not that great. It was a lot harder to get insights out. And maybe the specific tools that are responsible for that. But I don't think that aside from getting your organization behind it, that there is a lot of– there's not a lot of danger. I think that people are more afraid that there's going to be security gaps and things like that. ALLAN BEAUFOUR: I don't know, like it's the same. I don't know if there are any big pitfalls. There are some challenges, making sure your security department, your infosec department understands what it's about.

And it's hard doing data center security, but Google actually has a lot of people that know how to do that, that it's hard to compete with. It is easier to burn through some money, if you're not careful. Like if you have provisioned hardware you paid for, you're not burning any more money there. So you can do some missteps and burned through the money, but set up some quotas and that stops there. DAMIAN LAWLOR: OK. So you didn't– one of the things I hear from other people who are looking at this is how difficult it can be to break down the data silos or actually get individual areas of their business to cough up the data, if we can put it like that. Was that an issue for either of you, in terms of– PETER JAFFE: That has not been an issue at Hearst. I mean I would say the opposite, that everybody wants to have their data in a central location. And maybe this is organization specific at Hearst, but we have a lot of use across the organization with our common data set.

And in fact this has been great in Google Cloud in particular versus what we are experiencing in AWS. It's just been so easy to provision users and give them access to different data sets, and not just data sets in BigQuery, but other resources within the cloud like Compute and Storage and whatever else you might want. We have had such great adoption this year, it's amazing to me how many different people from different businesses across the organization at Hearst are using our data that we've been wanting them to use for years, and they've been wanting to use, but it's just been harder for them to get at it. ALLAN BEAUFOUR: Yeah, I think once people– we haven't had too much trouble. There are some old flaky systems that are a little hard to get the data out of, but it's not because people have been actively resisting. But we were talking about that just before, like once people– you shouldn't underestimate the benefit of having that Google Login and a web UI on top of BigQuery.

Like it sounds silly that that's an enabler, but it really is. And once people see they can just click a link, they're already logged in. Here's a SQL interface. And they see the speed and the ease of it, getting to the data, people are more than happy to get their data in there because they can see what that enables. DAMIAN LAWLOR: In terms of enabling, we had an interesting conversation earlier on today in terms of this concept of a democratization of data science, so the concept that these tools were actually go into allow access to data to a much broader group within the organization, and that would unleash creativity within your organization. Is that something you're seeing or you believe in? Where are we on that? ALLAN BEAUFOUR: So, yes and no. As I think also said earlier, like after the initial excitement, there's usually some frustration that follows quickly after. Just having access to the data doesn't mean you understand it. So you can give all the self-service access to the data you want and people poke around in it and then come up with reports that don't make a lot of sense, and then reconciliation nightmare afterwards.

So I 100% believe in that data should be accessible to everyone, regulation allowing, because it does open up. But that it just makes data self-service and everyone across the company will just dig into it and create their own reports or data science, yeah, I don't really see that working out in practice. DAMIAN LAWLOR: Anything different in Hearst? PETER JAFFE: No, I would generally agree with everything Allan said. There are some products coming out now that are trying to get at that. And with the data transfer from like DFP into BigQuery– Looker's just been experimenting with this thing they call blocks which is like these ready made dashboards that you can plug in on top of your DFP that has been imported through the data transfer tool into your BigQuery. And a couple of people at Hearst have been playing around with that and it looks kind of promising. So I think that maybe we'll get in that direction more. But I generally completely agree that opening it up to people who were not inclined to analyze the data is not going to get them to analyze the data.

DAMIAN LAWLOR: Some of the case studies we've heard from earlier, they talked about how the benefits, the actual business outcomes that they were able to drive from this move to cloud data and analytics. Are you comfortable to share any of the transformations or any of the positive impacts that you've had in your own organizations? PETER JAFFE: I can talk about specific products and things that we've built on that. I mean, we take our click data and pull it off of our 180 odd websites and put it through a pipeline and grab the impressions that come back from DFP. And I've got a real time revenue estimation tool that is built on top of that. So I'm pulling in all the clicks and I'm pulling in the impressions and I'm merging that together and I'm estimating how much revenue is being generated. I'm pushing that out to a tool that, it's got a dashboard on top of it that our traffic buyers use to see high performing CPMs. And they look for opportunities to buy traffic at lower CPMs and push it to those URLs that are currently earning high CPMs.

So that's a nice example, I think, of real world revenue being driven off of the data. DAMIAN LAWLOR: Thank you. Allan? ALLAN BEAUFOUR: We're a little further behind than that in our– it's more exploration right now, more [INAUDIBLE] enabled, more data sets in BigQuery that we can join together has enabled more exploratory analysis for us. For example, in advertising, getting DFP data out and our click traffic in there and exploring other advertising scenarios and how that would actually affect our revenue. But we're not where we actually build product. We've migrated products, data products into BigQuery, but we haven't built any new ones. So I have no fancy things yet. DAMIAN LAWLOR: Well I think that leads into my next question, which is, what's next? So where do you see– what are you looking forward to building out over the next 12 to 18 months? ALLAN BEAUFOUR: Oh, a lot. More data sets, obviously– the more data we can get in and make it a join away, and then clean up all the data quality issues that will come out of seeing the data together.

No, for me, there's a couple of key things– building data products, but getting the funnel of from a prototype on a data scientist's laptop to a data product running in production, getting that streamlined. So really empowering our data science team to do all their crazy stuff with R, Python notebooks, whatever, and then putting the machinery and the platform together underneath that makes it very easy for them to run that in production and feed into, whether they put stuff into BigQuery or feed to our product or whatever it is. There's a lot of commonalities between all data products. And getting that up so we can really empower the data scientists, it's really something I really, really look forward to. And then part of that is closing kind of the feedback loop. A lot of what we've been striving with Pub/Sub, Dataflow, BigQuery– maybe not BigQuery in that sense actually. But it's streaming, have most of our data streaming. I want more streaming data sets in there so we can do real time models that feed into the product, change the product behavior, then see the user actions flow back into that and close that feedback loop in many more ways.

That's what I'm really, really excited to see. DAMIAN LAWLOR: So going back to the comment that was there earlier about activation being a really important outcome of this, that we want to move past just analysis to actually automate the outcomes and then use those outcomes to drive the product. ALLAN BEAUFOUR: In real time. And right now, from a click hits our website to it's available for querying in BigQuery, it's 45 seconds. So that's pretty fast. DAMIAN LAWLOR: And Peter? PETER JAFFE: Yeah, we have [INAUDIBLE] an experience. Our click into BigQuery is also under a minute. It's pretty fast. Similar, we have more data sets that we want to bring in. And because the technology's evolving so quickly in this space, we keep iterating on how our pipelines work and making them better and better. And I think right now, we're interested in incorporating Dataflow and some other products like that that we're not really using yet, and just trying to make our pipeline faster and more robust.

It's fast and robust, but it can also be faster and more robust. DAMIAN LAWLOR: And maybe as a final question from me, you mentioned that technology is evolving very quickly. Where are you hoping it evolves to over the next 12 to 18 months? What would the evolution be that would add the most value for your businesses and the business outcomes you're looking to achieve? No pressure. PETER JAFFE: That's tough. I mean, just make everything easier and better. [LAUGHTER] DAMIAN LAWLOR: We do have a few product managers and engineers in the audience. ALLAN BEAUFOUR: Just like BigQuery is like no ops, like more of those tools, like fully get away from spinning up servers or instances or containers or whatever it is. The more no op tools I can put in there, the better. I was pretty happy. I can't remember the name of it, but the PII detection tool you guys launched, getting that stuff in there like automatic detecting PII, regulatory things that aren't as they should, the more integration I can get between all those products in a no opsy way, that's amazing.

I'm pretty happy to see Dataprep. But we were talking about earlier, Dataprep is one end of the scale and Dataflow is the other end. And the somewhere in between, I'm looking for something there that is not the heavy hand of Dataflow, not the somewhat simple Dataprep, but the in between, that I'm looking forward for in a no opsy way. DAMIAN LAWLOR: OK, so I'd like to close off the fireside. So just a thank you very, very much to Allan and Peter. [APPLAUSE] [MUSIC PLAYING]


Read the video

Learn how Advertisers and Publishers lower their time to customer insight on Google Cloud. Damian Lawlor, Matt Tai, Ian Liddicoat, Matthew Granish, Peter Jaffe, and Allan Beaufour show you how GCP’s data products and data/visualization partner ecosystems provide a clearer view of your business by breaking down your data silos.

Missed the conference? Watch all the talks here:
Watch more talks about Big Data & Machine Learning here:

Leave a Comment

Your email address will not be published. Required fields are marked *

1Code.Blog - Your #1 Code Blog