Google Cloud NEXT '17 - News and Updates

How to implement Cloud-scale rendering in Cloud Platform (Google Cloud Next ’17)

NEXT '17
How to implement Cloud-scale rendering in Cloud Platform (Google Cloud Next ’17)
5 (100%) 1 vote
(Video Transcript)
[MUSIC PLAYING] JEFF KEMBER: Hello, hi. My name is Jeff Kember. I'm a– office of the CTO, Technical Director for Google, and I've had the opportunity to work at such companies as Pixar, and Framestore, ILM. I've been with Google about 2 and 1/2 years, and really excited to be here to talk with you guys today about cloud rendering. We also have Hannes Ricklefs here today with us from MPC, and he'll be talking about some exciting work that they did on our cloud in the fall of last year. So the concept of batch rendering– batch computing, batch rendering, it's all kind of synonymous. When we talk about batch compute, rendering is one of the many use cases of it. So if anyone is here from other industries, be it finance or oil and gas or other high performance computing, the same architectures and technologies, network transfer storage, techniques, and such we're going to be talking about today, can be utilized across the realm. When we look an architecture diagram later, I'll point out some of the intricacies to the different markets as well.

So Google has a unique global network, and that's something I'm really excited about, in the fact that you can onboard in Helsinki and jump off in Taiwan, or get on in Tokyo and jump off in Frankfurt, you're on our private network, under the ocean, 100% of the time. As soon as your appeared with us ina POP, you're encrypted in motion and at rest. So the ability to offer that level of service is because of this backbone. The content delivery network we have as well is peered in more than 100 different locations. You can see we have additional trans Atlantic and Pacific links being lit up in 2018. So this road map is obviously subject change for timing, but this is where we are today. So when we talk about cloud scale rendering, it's exciting to me. I get to meet with customers who have a whole bunch of compute on prem, and they say we want 5,000 cores or 20,000 cores. And after we finished the architectural diagrams and conversations and look at the workloads and such, it's quite common to come out with a quota that's 100,000 cores.

And it's not that they are going to run that sustained, but they're in a situation where they can use preemptibles, which we'll talk about later, but they can use them for shorter periods of time, and they can be in a scenario where they can say, instead of running for 24 hours as they would normally, they can run for maybe four or five hours at really high compute numbers in order to get the frames back quickly. This is a list of some of the things we're going to talk about in terms of being able to put together a successful cloud rendering strategy on Google. So the first thing you need to do is to be able to connect with us. And with the 100 plus points of presence, most major cities in the planet we have a fiber line into the exchange there, or multiple exchanges in each of those cities. The opportunity to connect with us in a variety of different ways, and we have a new announcement at the show that occurred in the keynote earlier this week. We can talk about our connection options.

So we have a direct peering option, we have carrier interconnect, and then of course we have a managed VPN offering. And the opportunity to connect with us directly over the internet to initial proof of concepts and to be able to put smaller files up on the cloud and such, that's super successful. We also have customers who have multi-10 or 40 gig bonded lines into us, so there's a variety of scaling opportunities. This Google Cloud Interconnect slide– this was announced this week, and so pretty excited about the fact that we have a private connection into our switch at a colo facility. And if you end up with through a third party, this is what that diagram looks like in terms of being able to have a service provider providing the feed for you, and then a connection to you. So the opportunity to connect directly with us is available, and we're keen for customers to be able to have that functionality. One thing I encourage people to do if they're on the nerdy side, as I am– Jupiter Fabric.

So we name our different network fabrics after celestial objects, it's true. So Jupiter is the name of the network fabric that this is like a– basically, a 2012 circa research paper. There's a blog post that's really accessible, and then there's this multi-page paper. This image comes from the blog post and is also in the paper itself. What I think is really cool about it is there's some top researchers that diagram and explain how the software defined networking works at Google. There's pictures of the racks within the data center, there's diagrams explaining how Interconnects work, how top of rack switches work. The fact that as opposed to having maybe 10 gig econnections you may have on prem, we're running 40 gig interconnects between our various hosts and such. So the opportunity to have control over the complete software stack as well as the hardware side enables us to have really flexible quality of service, and be able to run at really high rates.

And that difference in network is a huge differentiator for us. And I encourage anyone to Google that, the paper will just pop up. Jupiter fabric is a great keyword. So in terms of how do you store the data for render jobs, yeah, there's all sorts of different offerings at Google. It's all object and block for us, and so we'll just dig into that a little bit more. On the object side, we have Google Cloud Storage. I'll talk about the different tiers of that, and the availability of them and how they work and durability and such. But on the block side, that's really more analogous to what you're used to dealing with on prem. So everybody is using an NFS posits compliant file system on prem, that's just what the tools need. And in that context, we offer persistent disk, both HDD for hard drive variant, which is our standard persistent disk, and we also offer SSD, persistent disk. Now, these PD offerings are both network attached storage, and they are incredibly fast.

What's cool about them is they can scale as well, so expect these numbers to go up over time. The other cool thing that I love is local SSD, and we'll go into some use cases later, but that is super high speed attached to the VM physically. It's 700,000 IOPS, basically the speed of DDR ram, DR2 in terms of use capabilities, and it's less expensive than buying RAM in the machine. So you may want to have an option to control the size of your VM and augment it with local SSD. We can talk about those strategies in a bit. In terms of the block store for Google Cloud Storage, multi-regional– fantastic offering in the fact that it lets you store data in two locations that are more than 150 miles apart. So if you're serving content, you want to have disaster recovery, you want to guarantee that you have the data in multiple locations that are physically far apart and disparate, then multi-regional is your friend. For most VFX workloads for storing the data, we recommend regional.

That guarantees the data's in the region you're doing the compute on it. It's inexpensive, highly available. We also have near line and cold line as longer term storage options. So the neat part with all this is that it all runs in the same subsystem, and the same API for access. You have the ability to get all the data available to you on a per object basis within milliseconds. There's no difference in the SLA for access time between this. You can see the pricing, this is all available on the web as well. The– I guess the kind of relevant use bits are regional and multi-regional, you're just paying for what use as you use it. Near line has a 30 day, and cold line has a 60– sorry, 90, I should say. The use case that we see is people aging the data off. So you set it on regional and use it, and you can, if you choose, set an auto timer that will move it to the longer term tiers, or you can do so yourself, as I mentioned, on a project basis. With near line, that is– that's data we tend to put for– kind of about a year.

If you're going to access it within a year, near line makes the best economic sense. If you're going to leave it parked for more than a year and it's kind of your LTO tape replacement as a long term storage option, we encourage people to put it on a cold line, and that gives you the option of being able to park the data there for a very long period of time. I had a question in my talk yesterday about bit rot, what do we do about that. We have some proprietary technology that goes through and ensures the data's fresh and ensures the integrity of it. So you can park your data on a cold line with the confidence that it's going to be there years from now when you go to read it. So using them all together. As I mentioned, the per object storage class is really cool. If you put 15 petabyes on and you're working with 300 terabytes that may be large, and someone comes to you and it's a VFX show, and you're working on a sequel, because there are so many of those these days, and you need to go back and get the geometry and texture files.

You're going to rerig the character, you're going to put the different muscle set up on it and such, but you want the texture files. You want the MARI archive, you want to get the geometry that was used. You can go through and just pull those objects out, and you can get access to them incredibly quickly. If it's something you're going to be rereading multiple times, move it to regional. If you're just going to read it once and restore it locally, then just grab it on a cold line. So in terms of NFS file system options, there's a couple of different options. We have a roll your own single node filer, which we think is pretty great. It's inexpensive, and the fact that it's open source, so it's essentially free, but you're paying for the VM and the storage is attached to it and the network that it runs on. But there's no additional cost beyond that. It is highly performant because of our network, and the local SSD is [INAUDIBLE] cache, and PDS is [INAUDIBLE] it.

So we find that we can typically serve line rate to a significant number of machines. You can also build out multiple single load filers. An example would be, you have a kangaroo and a robot fight sequence, and you want to put the kangaroo on one and the robot on the other. So you put the alembic data on those two. You can put the textures across those. You can put a right node elsewhere, or you can split alembic caches on one and textures on another. If you have six different characters in some superhero film, you can split them out across multiples, and that's a fairly common strategy. That's fine if you want to manage your nodes and if that's how you're architectured on prem. There's also Gluster, which is available as a managed open source solution. By managed, I should say supported, so a supported or a open source self-supported solution. Both of those are cloud file systems, and the fact that you have to synchronize the data to get there, we'll talk about that an asset management a couple of minutes.

In terms of caching file systems, though, we have a vFXT and also Pixcache. The cool part with those is they're a cloud file system that are NFS pauses compliant, but they also have a read through cache functionality. So you export your on premises, high performance file system to the cloud, and then as your render nodes read additional data, they hit the cloud file system first. If it's not there, they go back on prem, grab it, and then pull it up, and then you're able to read it multiple times in the cloud. So this is a fantastic diagram made by Adrian Graham, a Solutions Architect in LA. He put this together, and it explains pretty much how you'd architect for the cloud. This is something he and I collaborated on. In terms of– maybe you have license server on prem and cloud based license server. There's a high speed accelerated UDP transfer. We can talk a little bit about how to get things to the cloud. The neat thing about this, though, is if you remove the license server and leave everything else up, pretty much as it is, you can go through and take the rendering VMs over in the corner and changes those to GPUs.

Put an oil and gas workload on this, and suddenly you have a high performance compute scenario for that workload. You can either choose to go with local SSDs attached to the machines for simulation workloads, or you can have a large NFS file system in the cloud, and your choice of readthrough cache is just dependent on how your pipeline works. I'll wait until evereyone gets their photograph. This will be available on YouTube shortly, so, excited about that. But again, for me, the most exciting thing is the fact that any of the different high performance compute workloads I mentioned earlier run with this same architecture. We have Genomics and financial services all running on this similar architecture with really solid success. Now, once you're committed to us, getting the data into the cloud, that's the important part. So we offer gsutil, which is a command line tool you have on your local machine that you can use to push data to and from the cloud. But simpler technologies– or, actually, alternate technologies, such as rsync or Parsync, with commercial offerings as well, with partners for Aspera and Signiant, those are for the capability of providing significant threaded upload capabilities.

I'm sure everybody in this room is making pictures, and you guys are all familiar with the technology. The important thing here is that Google is open. If you want to run your open source, if you want to grab Tsunami and compile it and put it up and use that as your transport protocol, we're cool with that. Our transfer mechanism, I should say. If you want a commercial managed service, that's fine too. In terms of asset management, getting the data to the cloud, that's– it's a good problem to have. The easy model is, OK, we've done this stereo film, we finaled the left eye, now we want to take all the data, we know everything that we need, push it all up to the cloud, and then spin the VMs up and render the second eye. We've had a couple of customers do that. That's not a common workload, it's really easy in the fact that, again, you know everything you need from a dependency standpoint in advance, you put it up, you make the pictures. More realistically, as a commercial workflow or film workflow, where you're updating assets on the fly.

You've got your lighter making changes in Kitana, they're pushing the button, textures may have changed. You're in a scenario where there's always something that's changed before the render goes up, you need to make sure it's there. So these are a couple of different strategies to be able to get the data up. The simplest thing we do from a pipeline standpoint is encourage people to just put a simple rsync in in their publishing step. So when they publish and commit and version up within their pipeline, and then let the next downstream user know, or inform Shotgun, or however you've chosen to architect it, a simpler rsync will push that data up to the cloud. If you have a caching file system, you can have a micro node sitting on the back side that will read that data through the cache, and it will be warm at least to be there. One thing I should point out on this slide, we have one customer who I think does a very clever thing. After the render's complete, they create an H264 render of that.

And they do that at a fairly low resolution. They don't care about color accuracy, what they want is a really low bit rate, but– I should clarify. A reduced bit rate, but sufficient quality to make sure the render worked properly. Did the textures fall off, are the lights in the right place, does the render look the way it should, then they bring down their 2 or 4K, or however large it's going to be, giant EXR files with 50 AOVs and such. So that's a way to manage your egress costs, to ensure that you're not just pulling 100% of the data that you've written. That's one thing we discourage, in the fact that if you pull every temp file, every log file, every simulation cache down, your egress costs will be higher than they need to be. You can leave all that data sitting in the cloud, and it's going appear and grab the bits you want and pull them back out. If you also leave the data in the cloud when you want to run compositing passes later, that's a really nice idea, because you've already got it there.

So as far as scheduling jobs, we work with a number of commercial queuing options, and bringing your own is absolutely acceptable. So in terms of block storage for rendering, persistent disk, most customers are using PD SSD in terms of backing their filers with that. As an example, the Avere will run on PD SSD, but it will also run on local SSD, and there's a significant performance enhancement when you do that. If you run the Avere on local SSD, you get roughly a 2x enhancement across the board in terms of sequential reads, writes, and random writes, but you get a 4 and a half time improvement on random reads. And for VFX effects workloads, it's pretty much a random read situation. So it's super fast with local SSD. In terms of virtual machines– so because we don't have an entirely containerized pipeline running on Kubernetes at this time, for the most part we're using standard VMs for important workloads, such as file systems, gateways, and such, and we're using printable VMs for render nodes.

Now preemtiple VMs are significantly discounted. They're 80% off, it's a penny a core. So a 32 core machine is 32 cents US per hour. So it's– I made the joke it's $3, $4. In that context, you're in a scenario where you can run machines that are incredibly inexpensive, and at the same time you can run a whole bunch of them. And you can turn them on and off on a per frame basis, and we'll talk a little more about that. You'll notice as well that it's the same subsystem you're running on. There's no difference. The VM is identical, it's just how we bill it and how we support it. We don't offer live migration on the parental machines. If we need to take some of them back, we will. It's excess capacity that we sell. So for workloads that are important and critical to the infrastructure, that would be gateways, a license server, file systems, we encourage you to be able to keep those on a standard VM, and we will migrate those within our data center if we have a service requirement for the rack that they're on.

We'll move them somewhere else while they're running. I think it's really cool to see a single load filer running at full load, and then have it migrate real time and switch over in an instantaneous period– instantaneous, yeah, I know. It's super fast, and it's a really cool technology that only we have. The preemptible VMs we encourage customers to run with checkpoints, snapshots, and such. So that way you can run an 8, a 12, a really long render, and be in a scenario where if the machine goes away, you can spin the machine up and you get a 30 second warning on a preemptible VM that it's going away. So you can chase the log on that. You can use our cloud logging to inform your queuing system that node is going away. When that happens, you can immediately spin up another machine. You know, I haven't personally seen any situations where we stock out. We don't guarantee availability, obviously, however, I personally haven't seen a scenario where a customer has completely stocked out a zone in such a way that they can't fire up additional machines.

So in that situation, that means there's a machine on that rack over there, and you can spin that one up. That machine comes up in about 35 seconds, so you can start loading the scene as that other one is still writing its checkpoint, if you can get that last checkpoint off. So in terms of different models when you're looking at VFX rendering, the most common one we run into people who haven't done any rendering in the cloud yet is a rental model where they're used to going to Bob and Sarah's rental service and a truck rolls in and a couple racks come often and they rack them up and they plug them in and they bin pack the jobs. That means they're buying 300 machines for four days or three weeks. And they wait a couple of days to get the machines. And they have them for a period of time and back they go. Another model beyond that is a collocation model. So you don't have enough space on-prem. Perhaps you're in London. Cooling is an issue. Real estate is challenging.

You want artists there as opposed to machines in the closet. So you rent a colo space somewhere else, run some fiber to it. That's great. However, the scale is fixed in that scenario. You've just sort of moved the problem somewhere else in that regard. The dynamic model is the one that I'm personally most excited about and the fact that you take advantage of our per minute billing capability. And you can scale kind of as much as you need to. If you come to us and say, we need 50,000 cores, we don't have just unlimited cores sitting around all the time. But if you let us know what it is you need, we'll give you a quota that we can give you access to at any given time. But if you need something really massive– you're rendering a ride film and it's a high frame rate or it's crazy large and it's stereo, and so now you're dealing with 6K images and there's two of them and you're doing 60 FPS and such, if you need to render that at final production quality, you know in the schedule from production when that's going to occur, you'll let us know in advance and we can get you 120,000 cores or more.

It really just comes down to working with us from a scheduling standpoint, because Google Cloud Platform running on the same infrastructure as the rest of Google is in a scenario where we can work with the scheduling and capaciting teams internally and turn up and make available a large block of machines for you across a couple zones. But at the same time, that's a huge amount of scale. So in terms of workloads, kind of excited about the fact that beyond rendering, and although this is a rendering-centric talk, the same infrastructure that you put in place architecturally to be able to run and make pictures on the cloud enable you to also do high performance geometry caching for really heavy scenes in Houdini for instance or in Bifrost with Maya, simulation, FLIP, fluid solvers, destruction, finite element, deep compositing in Nuke, all of that's available to you. And we're also able to handle everything from camera ingest from Camera Raw, debayering, editorial, all the way through to archive after you've done final content delivery.

So one of the workloads that I spent a bunch of time tuning in production was simulation. I'm going to talk a little bit about some fluid simulation and point out some of the differences between rendering and how to get around some of the common perceptions of, how do we do fluid sim in the cloud? That seems really complicated. There's a silly amount of compute and RAM required in order to be able to do simulations. The files they write are massive. 20 to 40 gigs per frame is what I was writing when I was on-prem at Animal Logic at one point for "Sucker Punch." I found that almost a third of my simulation time was write speeds across the network. And we had a great network there and a new high speed filer. But just they were large files. They took time to write. And the solver that I was using initially wouldn't initiate the second frame. So we had to make some tweaks and changes. But the opportunity to be able to write to local SSD is nearly instantaneous. It's super fast with the high speed storage.

So the wedging term is something that a simulation artist has to do. It's an old film term moved forward into digital land. But the idea would be that you want to make a fluid simulation, in this case. You have to figure out what the value is. Everybody who's run simulations in this room knows that it's a nonlinear situation in terms of the solver. In the solver you can't just say, oh, I want to make the force double, so I'm just going to double the force. It doesn't always work that way. You need time on the box in terms of number of hours of working with the simulator. So if I would start a shot on a Monday morning, it might be Wednesday before I have run enough simulation passes to figure out what is the vorticity going to look like, what is the impulse force look like. This is a simple resolution change. And you can see the number of voxel delta as well as the change in the look of the simulation. You can notice that the curling is completely different. It's not just a higher resolution version of it.

The solver actually looks a little bit different. So it's essential for the simulation artist to be able to have an opportunity to wedge quickly and be able to get responses back. Adrian Graham gave this a go. And this is some code that he put together, both a MEL and a Python version of it for wedging. And he used Maya Bifrost. I've also built this in Houdini. It's extensible in your own solver as well. This is how you actually run the individual simulations. The code samples will be visible on YouTube. So you'll be able to reproduce chunks of this if you wish. So in terms of should be able to deploy the wedges, this opportunity is really cool. I like the idea of a simulation artists being able to say, I want to run this workload. And I want to spin up eight 64 core machines with 250 gigs of RAM. You can go with custom shapes. So if your solver flattens out at 48 cores, no problem. 28 cores, whatever the optimum number is in terms of RAM and CPU, you only pay for what you use on a permanent basis.

So in this context you're going to do the base simulation. And then you're going to have to mesh that so you have a render object. And then on top of that, you're going to do additional passes based on that surface object you've made. And then you end up with the test render. The key though in this scenario is to keep all the data on the cloud. You write the really expensive base simulation out. Leave it there. The high resolution mesh you generate, you can leave it there as well. And when you do the render, you don't care about color accuracy for a simulation. You're just looking for motion. So you're going to have a shot camera, and if you wish, a witness camera. So you can have those two JPEGs generated. You can pull those down, display them on a simple HTML page if you want, and you can see all eight simulations running. And you can choose which ones you want to kill. So in terms of the architecture of how do you make that work, this is a nice little description of just saying, we have some sim VMs in the cloud, a license server to make them work.

You can point back on-prem, and then you have some local disk. It's really easy. You put the files in the cloud on a local SSD. You run the simulations. When you're done, you shut it off. And as the machines are running in this example, you can turn off three or four of them right away. And you can turn off additional ones. We can get like one, three, and seven, you think those are going to be a good one, you let those continue to run over lunch and come back and check them. And then you have some idea of what kind of numbers are going to work for you to get that look. If you want to do a full workstation build out, this is what that architecture looks like. And the cool part with this is you'll notice that we have both local workstations and sim VMs working here. So you can be in a scenario where you're running both hybrid and in the cloud for this. Something with our GPU offering is we offer both compute GPUs. So if you want to use a renderer like Octane, for instance, and you want to be able to render on GPU or you wrote your own custom GPU fire solver, for instance, we have a significant number of compute GPUs that are available and becoming available in all of our data centers around the world.

We also have display GPUs becoming available in the next little bit. And the cool part with that is you can use something like Teradici for a PC over IP solution and you can have a remote desktop session. And there are a number of well-known software providers that we're working with who are in a position of being able to offer their software through this service. So the zero client is really nice. You get this tiny little box. It'll run multiple 2K monitors. The nice thing is if you need to hire 30 artists, you can go out and get 30 artists and you can give them monitors, mice, keyboards, Wacom tablets, and a thin client– done. And they're now able to work in the cloud. In a shorter term scenario, you can have artists who can't get that shot finished. They're in a difficult place. Their rig is too heavy. The muscle sim is just taking too long. You can give them a machine that has twice as much RAM or double the cores or some combination of that. And that's really compelling if you've be able to get difficult shots through the pipeline.

So a sim factory in the cloud– we have a whole bunch of compute and storage. And now we have display and compute GPUs. So we're really trying to round out the complete offering of what's required to be able to have a completely virtualized pipeline in the cloud. The last thing that I'm going to talk about here is the idea of a production safety valve. It's important, I think, to be able to engage with us a little earlier. I do get calls from people and emails saying, hey, we're delivering this thing on Tuesday. It's Thursday at 2:00 in the morning. And there are some problems and they need to be able to get something out. We can help in that scenario, it's true. But it's quite a bit easier if you engage in a proof of concept a little earlier. And then we're in a scenario where you're set up and able to run on the cloud. You don't have to use it. You just have the capability. And that way if production comes to you and says, hey, the director changed their mind– that happens– and we need to be able to live with this, or another facility, schedule doesn't work out, we can take on an extra two sequences, but we don't have the render power for it, you can say no problem.

We can render this on Google. And we're already set up to do so. And with that, I'd like to bring Hannes up on stage. [APPLAUSE] Thank you sir. HANNES RICKLEFS: There we go. JEFF KEMBER: For you. HANNES RICKLEFS: Brain, thank you. And just leave my water here. Cool. Well, hi everybody. I'm really super excited to be here. What I'm going to do over the next kind of 30 minutes is to take you on a bit of a journey for how Google Cloud help MPC kind of bring the "Jungle Book" to life, showing some of the key challenges that we faced, give you an overview of the technical design that we implemented, and then also give you kind of an outlook, where we kind of see the cloud going for us in the future. First, a little bit about MPC. So we craft visual experiences for both feature film VFX and advertising, also including virtual reality and digital installations. We've been around for over 25 years. This is some of the past movies that we've been working on. Here's an overview of all the projects that we did in 2016.

Well, actually not all of them. We completed a total of 15 projects on the film side. And on the advertising side, it was 2,019, which is a great number if you have to put on the same slide with 2016 to remember. So although we delivered 15 movies in 2016, we tend to work on roughly 20 projects at any given point in time. So these ones are about to deliver this year. But a lot of these were already in the making in 2016 and going to be close to screen to you at some point this year. MPC is roughly 2 and 1/2 thousand people worldwide, the majority being split between Vancouver, Montreal, London, and Bangalore. All of our productions are run across at least two of those sites. That's predominately due to make leverage specialisms that are only available in some of our sites. So really close collaboration and data transfers between all of these sites is hugely important for us to deliver our productions. So talking about our productions, in order for us to create these visual effects, we constantly need to kind of plan how many people, how much compute, and how much storage we require.

To do that, we look at each of these shots that we can kind of see and we kind of break them down for days that we think we're going to require in each of the departments that are required to make these shots. So this gives us the kind of like total amount of mandates that we will need for the project. And from that we derive schedules, which we then use to see kind of that's the people, that's the storage and compute we require. I'd kind of like to take you through that process in a bit of detail, because it's actually quite important for how we plan. And then you can kind of see really how the cloud allows us to change that. Due to the nature of the movies that we work on, we tend to see these kind of massive peaks throughout the year in predominantly kind of late spring and late autumn, which is due to the summer and winter blockbuster releases that we're working on. So as I mentioned, we do roughly 20 projects. What I'm going to do just for simplicity's sake is use an example of kind of three somewhat fictitious projects.

But the kind of scale that I mention kind of literally applies to all of them that we work on. So here is kind of the result of us breaking down all of the work that's going to be required and then kind of figuring out how many people we need. And this is really the starting point. From this, we then go and have our internal models that we then use to predict the amount of compute and the amount of storage that we require. What is kind of interesting, that depending of the project– so depending if it's a character heavy shot, if it's an effects heaving shot, environment heavy shot, these profiles kind of change quite a lot. So you can see this kind of on the storage side. The big challenge is that all of these things peak at the same time throughout the production cycle literally. So just imagine 15 more of these kind of stacking up. To give you an idea of scale, when we talk people, that's roughly a couple of hundreds per project. On the compute side, we're talking millions of hours to compute.

And on the storage side, our projects tend to be in around the petabyte when they kind of reach that peak capacity. So that means that we need to manage, plan, and provision for these peaks. We also need to build all of our infrastructure to constantly kind of be able to scale up and down for us to deliver our productions without any impact. The other thing is when we look at the trends– now this is kind of like a 12 month schedule right here– we see these shrinking quite significantly. The thing that isn't shrinking is the complexity of the shot work that's required. So these peaks are just going to go and increase and increase and increase. Just an idea of scale– for the "Jungle Book," we had 800 people working on it. In total it was 35 million wall clock hours to compute the movie. And at peak, the "Jungle Book" required 2 petabytes of storage. So let me give you a bit of background details of what the scale and kind of complexity that we faced on the "Jungle Book." In 2016, Disney awarded us with the majority of the work on the "Jungle Book." We were tasked with creating 1,200 shots with over hundreds of characters that we needed to build, 60 environments, over 2,000 environment assets.

That resulted in roughly 12 square miles of jungle, which according to Google is a quarter of the size of San Francisco. So you can kind of get the perspective of how much we actually had to build. This watering shot that you can see right here had a total of kind of like 200 characters within it. And it was also the first shot to move up to completely to new software stack. Now our software stack uses a large amount of third party software, but then a huge amount of custom tooling that we have to build. And this was really important because we needed to hit the creative and artistic vision that Jon Favreau had set out, who was the director on this movie. It's been an absolutely amazing experience for us, really satisfying due to just the recognition we've received through the industry. Because it was a couple of weeks ago, I think, that we finally won the Oscar for the movie. So it's been amazing. But rather than talking, I think lights and the audio please, because this is a breakdown of some of the FX work that was required for the movie.

[VIDEO PLAYBACK] [MUSIC PLAYING] [END PLAYBACK] HANNES RICKLEFS: So as you can see, quite a lot of fluid stimulation, as Jeff was– [APPLAUSE] Oh, thank you. Thank you. Right. So as I just mentioned before the clip, there was quite a lot of new technologies that was required for us to deliver the movie. One of those particular pieces of third party technology was a new version of Renderman, which has really moved away from its more traditional rays-based pipeline to a completely native path tracer. We have a really great relationship with Pixar. And some of the initial tests that we did with this new version gave us the absolute great confidence that the photo realistic kind of visuals that were required to deliver would be possible with this new version. The thing is, we fine tune kind of like all of our compute tasks specifically for the task. So we look at what's the optimum in terms of CPU versus RAM versus storage per task. In addition, we kind of had to revamp our internal rendering software stack to make use of this new version.

So it's actually quite a significant risk for a production to say, well, we're going to change our renderer. To give you an idea of that scale, when I mentioned that the "Jungle Book" took 35 million render hours to finish– so that includes everything, like simulations and all that– 80% of that is Renderman related hours. So it was kind of a significant risk for us to do it. But I'm really glad we did. The other thing is, there was a huge amount of internal development that we needed to do, ranging for how do we model the environments, how do we simulate all the jungle, how do we improve our character animation, and also various workflow improvements. So one of the ones that I want to pick on is, we really needed to move up our quality for our QC and kind of just approval renderers, which traditionally would be more of these kind of like puppets cut up, and tend to use kind of like OpenGL GPU-based visuals. So what we did is we actually said no, we're going to use Renderman for that as well, which meant that we were going to go from something like this to something like this as part of the QC step.

It was really important for our animation department to see the impact of fur for of that kind of little subtleties in relation to the performance. So this was really important for us to get right for this movie. Now what that meant is that due to the volume of "Jungle Book," due to all of this new technology, our plans, when I showed you those graphs for compute, they reached new heights. So in essence we needed to do and get more compute. Now our existing approach, like Jeff mentioned, we would go out and we would go and purchase and we could look at our data center and get all of that stuff in there and get ready and then use it. Or we would go out and lease it. But that, for the time frame that we needed, was going to be very cost ineffective, and also kind of challenging to get in time for when it was needed. Now we had already had multiple conversations with Google to see how can we use the cloud to actually give us a third option. And this time, like all parties, said, we're going to go to and do it.

So what did we set out? We wanted to add 10,000 cores for a duration of two months. However– and this is really important– it was crucial to provide this additional compute with an official approval from our client. So that meant we needed it to meet there's and our internal security standards for cloud-based compute. All of our sites are regularly audited literally at multiple times throughout the year on our internal architecture and our technical implementation to ensure that they're aligned with the security practices that are part of the industry. To do that, MPC and Technicolor– so Technicolor is MPC's parent company– we partnered with ISE– they're the Independent Security Evaluators– to provide a security assessment of our kind of remote compute platform, and then before submitting that proposal to Disney for approval. This was an ongoing process throughout the whole project because any alterations, any new appliances that we wanted to introduce needed to be re-evaluated and then resubmitted to approval.

So here is the kind of technical design that we ended up with. MPC's existing infrastructure is built around centralized NFS filers for both software and production content. Both of these receive multiple updates per day in terms of, on the software side, new tools, new configurations. So they're in constant kind of influx and being worked on. All of MPC's sites are connected via Technicolor's production WAN. But all of these kind of film production zones, which is one of the big of security requirements, do not have any external connectivity– so no access to the internet from them. There are some shared services zones that you've got here that do have external connectivity. But these are tightly controlled with application and kind of like network [INAUDIBLE]. So the initial design that we kind of put forward is to use VPN tunnels to provide burst capacity into Google's cloud using VPN tunnels. The big thing was is there was going to be no production content in the cloud, only for temporary logs and some scratch base that was required for some of our applications.

To provide these VPN end points, MPC worked with Sohonet. So they have a product called FastLane which enabled us to get the kind of VPN connectivity that we required given the network throughput requirements that we had. So since this initial setup, we've actually started to use the Google managed VPN on the GCP side. But we're still on the FastLane product on the Technicolor side. With this initial design, we knew from the beginning that we had to deal with latencies, because we weren't going to rearchitect any of our internals. So we chose Europe-West to just reduce the amount of latency that we had from our London site. However, the kind of nine milliseconds that we measured was too much for some of our tools that really were used to really quick access to our application's filers. Various different interesting timeouts were encountered. In addition, when we increased to more and more calls, they just became more and more apparent. So to deal with this– it was kind of on one of the slides from Jeff– what we initially did is we put out our application file out into the cloud.

Our preference would have been to just use the Avere virtual appliance that existed. Unfortunately, this didn't quite work out of the box. Because of our security restrictions, what the Avere needs to do is to actually speak to the Google APIs, which wasn't possible in the setup that we had. Now we already use Avere internally. We forefront all of our application and content filers with the Avere. So we have a great connection with them. They came in and they actually helped build us a custom virtual appliance that was then able to perform within our system architecture. Now along with the Avere, we also had kind of internal tooling that needed to require to talk to the Google API because, hey, we need to provision machines, right? We want new PVMs. We want new standard instances. So we needed to figure out a way for, how do we make this kind of communication happen? And the proposal that we put forward was to use kind of like a tiered proxy, which kind of enabled bootstrapping and any of the kind of APIs requirements to get GCP resources and for the Avere.

Now all of this was of course resubmitted to ISE, and then also to Disney for approval. So this is kind of the final design that we put in place. So on the GCP side, we had render nodes. So these were dedicated to run the render task. They were a mixture of PVM and standard instances running in MPC custom center six image. We had file servers, again, purely for kind of locks and any temporary scratch that was required by some of our application, also running a custom MPC center six image. Then the VPN end points, they were worked on by Sohonet to provide us the network performance through our VPN setups, and then the Avere virtual appliance, which was the custom MPC Avere builds to really help on optimizing the performance that we required. And really most importantly, that no content was ever stored at rest in the cloud. We can assume that behind the scenes we had host and key management all in place. And there were strong role-based authorization to ensure strict access for any kind of admin and support throughout this platform.

Now just– I like my graphs. So here is to show you kind of the importance of the Avere. So this was a graph of one of the render nodes without the Avere. And if you kind of pay attention to the left hand side if I change the scale, so once the Avere was put in place, we moved from literally 50 reads per second up to 2,000 reads in a second. Now this is really important because when you look at the performance of a renderer, they like to read a lot of data. That's due to the amount of texture, to the amount of geometry, and just various different data types. So this was really important for us to be put in place. The other thing that was important was– or literally– as a consequence, we obviously had huge amount of network traffic drop, with the red line kind of marking when we had the Avere in place. So at this point in time, we're really happy, right? We solved the security. We got all of the throughput that we needed to do. So it was now questions like, how much can we fire up the cloud?

And this is where I'm going to hand over to the video again to just show you the choice of what we could put up on the cloud. [VIDEO PLAYBACK] [MUSIC PLAYING] HANNES RICKLEFS: And there's audio again. [END PLAYBACK] HANNES RICKLEFS: All right. [APPLAUSE] Thank you again. Now unfortunately, our production budgets are not quite as infinite as the resources that you can get into the cloud. So what we had to do is kind of a bit of work of, how do we identify what kind of jobs makes the most sense to send to the cloud, whilst keeping within our production constraints? First off, we kind of went into, well what's the ideal criteria to send task to the cloud? For us, that was tasks with limited I/O, to just ensure we were not going to– because remember, the production content was always on our end. So any kind of download and any rights would cost egress charges– and generally to be quite compute heavy. So we just make sure we make use of the great instances that we had out there.

They should be tolerant to preemption so that we could make use of the lower cost PVMs wherever possible. In general, that meant that our lighting and rendering tasks just fitted that profile a lot better than comp and FX, which tend to be extremely high I/O, and also very intolerant for preemption if you just kill the simulation halfway through. So our RenderMan tasks were the ones that we targeted first. The improvements to our QC process to move to RenderMan in comparison to kind of the OpenGL renders, plus the volume of shots that we had had thousands of tasks in our internal backlog. So luckily from the QC kind of perspective, you don't necessarily need a full sequence of frames to ensure that things like lighting directions– I mean, I can look at this image and say yep, the lighting direction is OK. I don't need to see everything else. We can just move that along and do a final render. And also kind of from animation perspective, you were OK to just see the performance even if kind of frames were missing.

It was more important to just have a reference point to ensure, yes, from a QC perspective, we're actually able to move these things along. So these kind of QC renders was a perfect target we send to the PVMs. Now our production moves through a large number of departments that are kind of shown here. All of our work is kind of split into two main areas. We have what we call the build areas where we construct all of our kind of characters and environments. And then we reuse those on all the various different shots. For the "Jungle Book," I mean, every single department was important. But the kind of two key ones was animation, from just a performance perspective that we had to deliver, and lighting to ensure that we can deliver the photo realistic visuals that were required to meet the vision of the director. Now our ability to add additional resources for the animation QC, and then from a lighting QC and final render standpoint, enabled production to be really confident that we are going to hit the demands of the shows.

So on a daily basis, kind of what happened was that we looked at our daily backlog in terms of volume and priorities. If required, a budget was set. And also priorities were provided to then ensure we knew kind of like what we wanted to get through in the night. So we used Tractor, one of the schedulers that was mentioned by Jeff, for our internal management of our hosts and the submission and processing of the tasks. So there was kind of like two main areas that we needed to develop our own tools for. One was for the automatic provisioning. So this was to say given the budget, how many PVMs and how many standard instances do we require? And then on the other side was kind of like how do we ensure that we can get kind of the right tasks sent over to the cloud? What was important here is in order to make best use of the Avere, we kind of needed to ensure that we can only send kind of like the same kind of scenes or shots to the farms. So make sure, that, hey, lets just make sure the water hole shots all go to the cloud, or lets make sure that only the jungle sequence here on the bottom goes.

So that way the cache wouldn't constantly kind of trash itself. And that worked really well. And then also to say that Tractor in itself really handled the load quite well of all of these new additional machines that we're going to add on a nightly basis. So we didn't need to do any work to just handle the increased load that was on Tractor itself. All right, just leaves me to conclude. First off, here's some statistics of how much we used the cloud. At kind of peak, we added in excess of 14,000 cores per night. Overall, they computed 360,000 hours. That's roughly 41 years. 1.5 million tasks were completed. Internally, we think this was a really great success across our technology, operations, and production teams because everybody came together. You needed to control the amount of money that we were going to spend, and we needed to know that we actually can get the throughput and have all of the tech ready. The agility and just general availability of the cloud resources enabled literally millions of tasks to be completed in a very short time.

So if we look at kind of what I call are the key benefits– so the quickly added capacity, this was really important. When a production of that scale is in demand, you want to ensure that they feel as confident that they can that yes, we're going to be able to deliver the movie. This was really important for us. The agility, just in terms of being able to say, hey, today I want 70% PVMs and only 30% standard instances was really important so we can guide it on a daily basis. The other thing that we found, which was really cool was that as I mentioned in the beginning, we kind of fine tune our processes to say, all right, what's the split of CPU versus RAM on these particular machines? On the cloud, actually getting a new machine profile is really easy. You just say, hey, I want this kind of configuration and be done. If I would have to do that internally, it's actually quite a lot of effort to make sure that we can reportion parts of our farm. So this actually helped us to do a lot of playground testing The other part is, we now have established security processes and documentation and everything that we can now go and engage with a lot of our other clients to also ensure that we can use the cloud for their projects.

Then controlled cost and [INAUDIBLE], again, our budgets are quite constrained. So knowing of what we can compute, what we're able to deliver, and the associated costs with this is really important. And then finally, we've now really established some really close partnerships across the various different third party– well, the cloud, Jeff and all these guys, which really helps us kind of look into the future. So let's look into the future. We strongly believe in the future of the cloud for visual effects. So we're now working on a project to literally take our whole platform cloud native to enable us to process our visual effects. Now our core platform tooling that we already have is built around concepts or cloud-like concepts. So we have a strong microservices framework. We have a well established asset management platform. And our software and kind of build and configuration system is all built around containers. That we have all of these things. What's nice with the cloud, I'm now able actually run that in an environment which is kind of separated from our existing infrastructure.

So I can iterate a lot quicker, which then helps me to build out that new platform and then being able to change my internal stack a lot easier. Also, if this stuff is of interest to us, we are looking. We are hiring. So do we have a look at that link and/or grab me afterwards. Now here's kind of the outlook where we see us kind of moving in terms of the cloud. Rather than going through every single one of those departments one by one, what we actually want to do is just take the whole lot and move to the cloud, and then being able to literally just pull down the QC and kind of like final renders to look at locally. Now that would mean we could use the cloud to also manage our storage constraints and more importantly, deal with the kind of like intersite connectivity, because the global nature of our productions just is what we do. So this just lets me to thank Disney for just allowing me to show you not only the amazing visuals but also the technical artistry that was required to bring the "Jungle Book" to life, my colleagues at Technicolor, my team at MPC– it's been an ongoing dedication and passion to just embrace this new technology– Sohonet and Avere for really being behind us for their availability and their ongoing support, and for the GCP team.

I mean, Thomas, [INAUDIBLE], [INAUDIBLE], and Jeff, these guys are amazing. And just this ongoing collaboration, we're looking forward to see what the future brings. So with this, just want to thank you all for listening. And I guess– [APPLAUSE] [MUSIC PLAYING]


Read the video

In this video, Jeff Kember and Hannes Ricklefs will discuss high-performance rendering on Google Cloud Platform. Topics covered will include security, connecting to the cloud, effective use of preemptible VMs, cloud file systems, licensing and additional VFX workloads including simulation and deep compositing. Hannes Ricklefs, Head of Core-Engineering at MPC will be speaking about their Academy Awarding winning VFX work. Disney’s The Jungle Book required MPC to deliver work of unprecedented visual complexity. MPC wanted to ensure excess capacity was provided through the flexible and scalable nature of cloud-based resources, whilst meeting the strict security requirements of our clients.

Missed the conference? Watch all the talks here:
Watch more talks about Infrastructure & Operations here:

Leave a Comment

Your email address will not be published. Required fields are marked *

1Code.Blog - Your #1 Code Blog