Google Cloud NEXT '17 - News and Updates

Best practices for managing Container Engine Clusters across multiple teams (Google Cloud Next ’17)

NEXT '17
Best practices for managing Container Engine Clusters across multiple teams (Google Cloud Next ’17)
5 (100%) 1 vote
(Video Transcript)
[MUSIC PLAYING] ALAN NAIM: Good afternoon everyone. My name is Alan Naim. I am a product manager at Google. I work on Google Container Engine. And I'm joined here by David Oppenheimer, who's a lead software engineer for Kubernetes and Google Container Engine, and he's going to be presenting with me on the demos. So how's everybody doing today? Good? Friday, all set. Hope you're having a good conference. So the title of our talk today is really– it's around best practices associated with managing Container Engine Clusters across multiple teams. So show of hands, how many people here are actually running containers in production? That's pretty impressive. All right, I guess there's so much excitement around containers and production that the system– and then of folks that are running containers in production, who is using Kubernetes? And keep your hands up. Who's using Kubernetes is in production? All right, awesome. So the agenda today is really around– let's start by talking about how you would run Container Engine Clusters across multiple teams.

We're going to break the talk into three pieces, and the first piece is really about Kubernetes resource hierarchies. So those of you that are running on Google Cloud, you understand the concept of a project, and then you have a cluster. And we're going to walk through some best practices around what we're seeing out there with customers that we're talking to, around how they package these components within these hierarchies. Secondary is really around resource management. So now that you have your cluster or clusters, how do you run heterogeneous applications that have different resource requirements and share those resources so that the applications are getting the best quality of service associated with their needs? And then finally, we're going to talk about role based access control and Kubernetes, and walk through some of the features that we have, in particular in 1.6, that enables finer level granularity around applying axis control to something like a cluster namespace.

And then throughout the talk I'll be inviting David Oppenheimer up to take everything that I'm presenting and make it real, show you a demo of how it works. So let's get started. So those of you that are familiar with Kubernetes know that this is something that is based on 10 years of experience Google's had in running and managing containers. We took all these various best practices associated with how we do things and applied it to today's world, an open sources project called Kubernetes, that's run today by the Cloud Native Computing Foundation. It's 100% open source, written in Go, and it'll run anywhere. So these are things to remember about Kubernetes. For those of you that are moving to the cloud and going down that journey, this is an opportunity to really have a central way of doing things and deploying these hermetically type environments that are very predictable across different environments. So as you all know, containers are great. They add a lot of value, but typically after the first five hours, you encounter challenges around scale, health checks.

How do you get containers networked together, and how do you operate an environment and ensure you're able to push things into production quickly, and do rolling updates, and all these various things. How do you take containers that typically don't have any notion of data, and attach volumes to them, and manage the entire life cycle of these things? This is where Kubernetes comes in, and it provides this environment, this platform for you, in terms of running and managing containers at scale. So this was my 101 two-minute on Kubernetes, assuming most of you already understand bits and pieces of it, and you're interested mostly in around, how do you manage different clusters across different teams? So the typical Kubernetes journey usually starts with, hey, we're getting into containers, we need an orchestrator. Let's find an application that we think makes a good fit for containers, and let's go ahead and do proof of concept. You identify a group of developers. They take their application, they containerize, they run it on Kubernetes.

A month later you show it to the rest of the team and everybody's like, this is cool, this is awesome. How do we get some of this? Another team comes along and says, hey, we want access to this cluster. So in some cases you'd let them spin up their own cluster, in other cases you basically give them access to the sandbox cluster that they go to town with and try and run their applications and so on. Then you get to the point where, OK, we're ready to go to production. Now what? Right? There's some choices that you have to make at this point, and some of you I've talked to are facing these things. One choice is– do I create a project per team? Do I create a cluster per team? Do I run one shared cluster and have teams run their own namespace? What do I do? I have requirements that have different regulatory requirements. Do I put them in a different cluster, in a different project? How should I look at these requirements, and what's the best practice associated with that?

There's really no single perfect answer. It all comes down to how your organization's structured, how your teams are structured, and really, the requirements of the application. So what I want to do is just walk you through some examples of things that we see out there and then provide you our opinion around some best practices. So one common example we see out there is you have one project and one cluster and then different name spaces, and these name spaces map to different applications, different teams. So we see quite a bit of that. The pros is it's very easy to manage. It's a single cluster context that you're dealing with, and you're basically running all your applications using that single context. There's cost savings associated with it. You're taking advantage of the namespace for isolation. So you're able to pack more resources into that one cluster. So you're getting some potential cost savings. Your resource isolation is really based on the namespace quota that you're defining, and we'll talk a bit about that later on.

And then you can take advantage of some of the features that we'll talk about later that we introduced in Kubernetes 1.6 that applies role based access control to the namespace itself. Some of the cons associated with that is you're limited in terms of what you can isolate from a namespace perspective– compute and memory. So if you want to isolate with more resources, these things are not available yet. And really, you're basically leveraging that quota on that namespace as the only parameter for doing the isolation. So that's one example. Another example is you have one project but then different clusters. And this one we actually see the most. So in this particular case, you have a staging cluster, or potentially like a Def test staging cluster, and then you have a prod cluster. And within each one of these clusters you have different applications that run in their own namespace. Some pros with this is you now have the ability to isolate some of the cluster resources.

Like your master is isolated from the masters for staging and test. You're not exposing your master to potential DDoS attack that could happen all of a sudden. You have the ability now to run two different networks. So you're not sharing that same network across these two different operating environments. You can basically have dedicated virtual machines and DNS service for each one of these environments. So this is something that we see quite a bit of and typically it meets, say, 80% of the requirements for companies that we're talking to. And then the other important one from advantages, a cluster can be deployed in a single zone or it can be deployed across multiple zones. So in prod you might want to deploy across multiple zones. In dev tests you probably want to deploy in a single zone. So you have the ability to assign different availability characteristics across these cluster environments. Disadvantage, again, you lose some of the management benefits. So you're managing two different clusters and you have to manage across different contexts and so on.

The third case is really different project, different cluster, and different namespace. This is something that we see with some of the bigger organizations that want to put their production environment in a separate project. And the advantage for that is you have your own quotas, and these quotas really are your compute engine quotas that you can allocate specifically for that particular project. Somebody running dev test staging can't come along and consume your quota. The next thing you know your application has a spike. And you're calling your cloud provider, and you're like, hey, I need for you to increase my limits. And you're like, these things take time. Another advantage is if you're doing chargeback. Today, unfortunately, there's really no way within a cluster to do chargeback based on applications that are running within that cluster. So you'd have to run it in its own project and then do chargeback based on the billing that you get for that particular project.

In the future this is an area that we're very interested in addressing, but for today, if you have requirement around chargeback, run it in its own project. And then greater control around identity and access management, definitely. And again, like the previous slide, you lose some of the management benefits because now you're dealing with multiple projects, multiple clusters. But you can take advantage of some of the things that we've introduced around cross project networking and all these various things that we're doing in Google Cloud Platform that are very beneficial. So suggested best practice is really run your production environment separate project, put your dev test staging in their own project, split them up based on clusters depending on your needs, and then use namespaces to break out your applications within that cluster. And here's just a list of the recommendations that I just mentioned. So most customers create cluster per environment, and then this is just a repeat of what I just said.

Complete isolation across namespace is something that's still a work in progress. We'd love to hear your feedback on some of your requirements in this space. So after the talk please tell us. So let's talk about resource control and resource management. So most of you probably understand what a pod is, but for those of you that don't, any time you have containers that potentially need to be tightly coupled, or share a network, or share volumes, you put those containers in a pod. So pod is this architectural pattern that was introduced in Kubernetes that really is about how you run and deploy containers. It is a unit of scale. Everything that you run in Kubernetes runs inside a pod. And when your application scales, it is these pods that are actually scaling across the nodes within your cluster. So everything within a pod lives and dies together, and basically it's how you deploy your application in Kubernetes. What we've done with Kubernetes is give you the ability to isolate some of these resources.

And today we have various compute resources that we expose to enable you to do some isolation terms of the containers that you're deploying in Kubernetes. So some examples are CPU and memory. So based on your application's CPU and memory requirements, you can specify these when you create your containers, and then the scheduler will automatically apply the right quality of service for your application, depending on how much CPU and memory that you require. In the future we're looking at also extending this to local storage. So why would you need strong isolation for pods? Well, suppose you have certain applications that are running in the same cluster, but you want to ensure that they don't interfere with each other. You would isolate your pods, and this goes back to the whole notion of cgroups. Those of you that use Google Cloud and sometimes understand the fact that our virtual machine spin up very quickly, it's because our virtual machines actually run inside containers.

And the fact that you don't see a lot of noisy neighbor issues with Google Cloud Platform, and a lot of this has some of the benefits associated with using cgroups and so on. Better predictability. By isolating your pods, you're actually able to– for certain applications that you want guaranteed, be able to specify that and ensure that they're always going to be able to run. A web serving application and a monitoring application are critical for your business. So you can't handle a situation where you can't schedule a web service application. Some of the cons associated with it would be oftentimes you don't know what your application needs. Perhaps you don't have the historical data, and it's not trivial to actually figure these things out. We've tried to make it easier, but oftentimes you just have to start with, perhaps, no isolation and then learn as your application's running. Look at cases where you have evictions, and based on these evictions, figure out, and tune, and tweak, and figure out what the right limits for setting for your pods and containers.

And then utilization– when you isolate, you're actually perhaps reducing your efficiency. Because you're asking for guarantees, and your application may not take advantage of all these guarantees. So there might be some unused resources that end up lowering your efficiency. So we have this concept of what's called a request and limit. So when you create a container you basically specify a request, and a request is how much resource, actually, does my application need. And you specify this in CPU and RAM. That's pretty much it. So you can actually specify these things at container creation time. Now, suppose that you want to oversubscribe or overcommit. We have this other parameter, actually, that's called a limit. And if you set your limit greater than your request, then you're actually overpromising. Similar to an airline that overbooks and assumes not everybody is going to show up to the gate, and if everybody shows up to the gate, someone is going to get pushed to the next plane or given a free ticket.

So these two parameters, request and limit, are things that you can configure at the container level. Now, a pod actually inherits all these requests and limits that are defined at the container level. So a pod request becomes the sum of all the container requests, and a pod limit becomes the sum of all the container limits. OK. So based on your request and limits, the scheduler actually figures out what the quality of service is for your pod. So if you set your request greater than zero, but you set your request equal to your limit– so you don't want overcommit, you don't want oversubscription. You set those two values as equal. Your pod is actually going to be guaranteed to run. Now, if you set your request to be a certain value but then set your limit greater than your request, then your pod is going to get what's called a burstable quality of service. Meaning that if there are no available resources, it will do best effort. It will try to deploy this pod on these nodes, but there's no real guarantee.

Now, by default, if you don't specify anything and you just keep your requests at zero or don't provide any value, then your pods are scheduled what's called best effort. And they're the ones that will typically get evicted first when you don't have enough resources. So in the order of eviction, it's best effort, burstable, and then guaranteed. Now, your namespaces also have requests and limits that you can specify. So you provide these requests and limits on the namespace level. And what actually happens is now all of the requests and limits associated with all pods that are running within that namespace, you're guaranteeing that they will not exceed that quota limit that you're specifying at the namespace level. So a good use case is if you have different teams that are sharing a cluster, they each have their own namespace. One team you know traditionally has gone off and done some things that literally are unpredictable, and gone off and did some resource utilization.

They haven't shared much insight with you in the past. Well, you can set a quota limit for their namespace and just kind of ring fence them in terms of usage. So how would you overcommit resources on the cluster? So one way to do so, as I mentioned earlier, is on your container. For your container, set your limit higher than your request. And once you do that, you will actually now have the ability to promise more resources than what's available. This is good for things like burstable type workloads. Perhaps you want to take advantage of as much efficiency as possible. For pods that you think always need to run, always have to be up and running, then you basically set your request equal to your limit for those containers, and those will get guaranteed. Best practice is– oftentimes most of us don't know what the limits and requests are. We don't know what our applications are going to consume. So start with best effort or even burstable, and then see how things come along, and then keep track of pod evictions.

And when you see a particular pod getting evicted more than others, then you know there's something wrong, and perhaps take that pod and set it as guaranteed. And then there could be other pods that literally are– they can handle failure. They don't really need to be running all the time. Set those as best effort, so more like the batch type workloads. So let's move to the demo, and I'm going to invite David on stage to walk through it. DAVID OPPENHEIMER: OK. Thanks, Alan. Can we switch to the input from the PC? Perfect. OK. I'm going to try to make this– can people see? Well, it's right at the top. Is that large enough? I can make it a little bit larger. Let's start and see how this goes. So I'm going to show two demos– one of the quota functionality that Alan talked about, and then I'll show you one of resource isolation. So the quota demo, it's pretty straightforward. Can people see this? I can make it one more level bigger if– yes, no? It's good enough.

OK. People are saying it's good enough. So we're going to create a namespace just to run this demo in. And to start with, you'll see that there's no quota associated with the namespace. By the way, this command kubectl, is the command line client that you use to interact with a Kubernetes cluster. And so we're saying here, tell me what quotas are associated with this namespace called demos that we just created, and it's saying there's no quota associated with it. And so let's set up a quota. And the way you set up a quota in Kubernetes, in Container Engine, is that you create a yaml file that defines a resource quota object and specifies the quota. Like Alan said before, you can set a quota for the total amount of requests, or total amount of limit, or total number of pods that are allowed in a namespace. So this quota is fairly simple. It's saying that we're going to set a quota of 2 CPU and 2 gigs of memory, maximum, across all of the pods that are going to run in this namespace.

So that's what the file looks like, and then we run cube control to push that file into the server, which will create the quota. So now we've created the quota, and let's see how it works. Before we do that, we'll run this kubectl describe, which will show us what quota we've installed. And you can see it says that there is a quota called demo quota that's associated with the demo's namespace, and there are zero resources, zero CPU and memory resources in use right now, and the quota is 2 CPU and 2 gigs of memory. So first we're going to create a pod that uses 60% of the quota, and that should succeed because you're allowed to use up to 100% of a quota. This is the yaml definition of a pod. It's very simple. It's saying that the pod should run in the demo namespace, or the demo's namespace. I don't know if I highlight stuff if you can see it. Yeah, I guess you probably can. That's what this line is doing. The pod has a name, and it's saying that we're going to run this very simple test pod called hostname that just is a server that serves the hostname if you connect to it in this pod.

And then we're setting these resource requirements on the pod. Like Alan said before, you can associate limits and requests with your containers. Here we're just going to associate a limit with the container because Kubernetes defaults the request to be equal to the limit if you don't specify it, and it fits better on the screen if we don't. But what will happen is that when we create this, both the limits and the request will be set to 1.2 CPU and 1.2 gigs of memory, which again, is 60% of the quota. So now that was just showing you what the yaml file looks like. Now we actually ask kubectl to create that pod, and it says the pod was created. So it was created successfully. We can run kubectl describe. And there is tons of output here, but you can see that it says that the pod is running. And you can see that the limits are 1.2. By the way, these units, the M is Milli CPU, and the M for the memory is also like millibytes. I'm not sure that that's as useful as Milli CPUs.

But anyway, it's showing that the limits are 1.2 CPU and 1.2 gigs of memory, and the request is 1.2 CPU and 1.2 gigs of memory, and the pod is running. By the way, all of the demos that I'm showing are running against a live cluster. Obviously the typing part has been pre-scripted, but everything that we're doing here is actually running against a live cluster. So that was just to show you the pod is running. And now we can try to create another pod that's also going to use 60% of the quota, and this should fail because then 120% of the quota would be in use. So this is another pod. It has a different name, but it's otherwise identical. Has same resource limits and same resource requirements. So we ask Kubernetes to create this pod, and then it gives an error. It says, you tried to create this pod but it was forbidden because it exceeded the quota. You requested such and such amount of CPU and memory, and then it tells you how much was already in use and what the quota limit was.

And so then you can see why it was rejected for using more than– it would have used more than the limit if the system had allowed you to create it. Oh, and this is just to show that the pod was actually not created. We do kubectl describe to show us the pod, and the pod is not found because it was rejected at creation time due to it would have exceeded the quota. So then the last piece is just to show you that we can create a pod that uses 30% of the quota because then a total of 90% will be in use. So this is, again, a third pod identical to the first two, except the limits– you can see down at the bottom, 0.6 CPU, 0.6 gigs of memory. We create that. It says it was successfully created. And then when we run kubectl describe, you can see that it's running and has the requests and limits that we specified. And lastly, we can ask Kubernetes to tell us how much quota is in use by using this kubectl describe on the quota object, and that tells us both how much is in use and how much is the maximum amount of quota.

So since we asked for 60% and then 30%, you can see 90% of the CPU quota and 90% of the memory quota are in use. So that's a pretty simple demo just to show you how you can set resource quotas on namespaces, the namespace granularity for limit, for request. I didn't show number of pods, but you can also set quotas on the number of pods that can be created in a namespace to limit that resource as well. So I'm going to shift to a second resource demo. And then after that, Alan's going to continue with the next part of the talk. The second resource related demo that I'm going to show you will be about resource isolation, which is one of the things that Alan talked about. The way that Kubernetes is able to guarantee that containers that have that guaranteed quality of service are able to run and not be evicted is that they constrain the amount of resources that other containers can use, and that is done based on the limit. So as Alan mentioned, the request is the amount of resources that you tell the scheduler that you need, and that amount is always guaranteed to you.

It always puts the pod on a machine that has enough resources to guarantee that that request is satisfied. And then the limit can be higher than the request if you want to do overcommitment. But the limit is where the system essentially cuts you off. Like if you exceed your memory limit, then the system will kill your container. If you exceed the CPU limit, it will throttle the container. And these enforcement mechanisms are the way that the system can then provide the guarantees to the guaranteed quality of service pods. So I'm going to do a quick demo to show you that. We're going to, again, create a namespace for this demo, and we're going to run a container that just turns up a bunch of memory and CPU. So what we're doing here, I'll explain what this is doing. It's going to set the limits to 0.5 CPU and 200 megs of memory, and then the amount of resources that it's actually going to use is going to be 150 megs of memory and 2 CPUs. So for this first part of the demo, it'll be a container that is using less memory than the limit but more CPU than its limit.

And you'll see what happens. So first, let's do kubectl describe to verify that it's running. And we can see that it's running, and the limits and requests are 500 Milli CPU– so 0.5 CPU. And also you see the memory there. And then we can look at the container usage. So unfortunately, this takes a minute or so to start up. But in a minute, once this has started up, you'll see that the actual usage, the CPU usage, is constrained to be within the limit. So just to recap again how this container was configured, it's going to try to use 2 CPU. That's what this dash CPU's 2 is saying. It's going to try to use 2 CPU, but the limit is set at 0.5 CPU. And so the resource isolation in the kernel level should prevent it from using more than 0.5 CPUs. So hopefully this is working now, which it is. [COUGHS] Pardon, me. And you can see that it's using 500 Milli CPU and the full amount of memory it requested, because that was within its limit. But the CPU is being constrained to 500 Milli CPU.

This command, by the way, that I ran, this kubectl top, it's a very cool command. It will show you the resource usage of your containers running in the cluster. And so like Alan was talking before about how if you want to provide good resource isolation, it's useful to provide these resource requests and limits, but you may not know the first time you run your application how to set those. And so this kubectl top command is very useful for identifying how much resources your container is using, and you can watch that over time. You can also, of course, use a monitoring tool like Stackdriver and Container Engine, or something like that, and then use that to set the requests and limits. But anyway, that's what this kubectl top command is. So then the second thing that I'll show– that was showing how limits can limit the CPU consumption by throttling if the container tries to exceed the CPU limit. And then the second part is to show you that if you try to exceed your memory limit, then your container will get killed by the system.

So here we're going to run another container. We deleted the first one. And this one we're setting the memory limit at 200 meg, and we're going to use 250 meg. In other words, the container is going to try to use more memory than specified in the limit. And you can see here when we do the kubectl get pods, it shows that the container was killed due to exceeding its memory limit. This says restarts one because Kubernetes will automatically– this container was configured to automatically restart on failure, and so it's going to restart the container. Sometimes people want containers to restart even if they use too much memory, because they know they have a memory leak and they just want the system to periodically restart it, which is one way to deal with memory leaks. And so you can also set the container to not restart in this condition. But anyway, this is just showing that now it's restarted a second time. Again, it's showing you the status was that it exceeded the memory limit.

And we do it a third time, and now it's in this crash loop back off state, which is to prevent the system from just– like all the resources on the system from being consumed, restarting the container over and over again. It notices that it's killed it a couple of times already, and then it will continue restarting it but at a slower and slower rate to prevent consuming all the resources on just doing restarts. And here we caught it on the third try. It's in the running state. You get the idea. This is just showing that you can do kubectl describe. And it will show you– like if you walk up to the system and you wonder why the site's not running, you can see that it says that the last state of this container was a terminated state. And the reason was that it was killed due to exceeding the memory limit. So that's the end of the demo. That is just showing you that we use resource isolation techniques in the kernel to limit the resource consumption for memory and CPU of containers.

And the reason that's useful is so that then we can provide the guarantees to the other containers that are requesting resources. Yeah. So I'll let Alan come back up and continue with the next part of the talk. ALAN NAIM: Thank you, David. Awesome demo. [APPLAUSE] Let's switch back to the slides. So there's also this resource or object that we have in Kubernetes called initial resource. Originally I talked about the case where you're deploying a container, but you actually don't know how much to set your request or how much to set your limit. Well, you can take advantage of this initial resource, which actually is learning how your container is utilizing resources and will set the appropriate request and limit for your container. This is only available today in Kubernetes open source, but it will make its way into– it's currently an alpha feature. It will make its way into Container Engine shortly. So the idea here is think of a world where you have containers that you're provisioning and scheduling.

You're taking advantage of initial resources to do some learning around resource utilization, and then you plug that all into the horizontal pod auto scaler, and then plug down into the cluster auto scaling. So you have the system now that's operating. Everything is automated, but it's running itself. And that's the goal in terms of what a lot of customers want to achieve. And we provided these patterns for you to be able to start taking steps towards that. So in terms of where we're going with cluster resource management– as I mentioned earlier, we're looking at adding disk as a first class resource. There's some work that's being done around GPU support as well that will make its way into Container Engine. Our goal is to really give you the ability to extend these resources. So perhaps there are certain resources that you want to meter by or do resource management by. So you could extend your API and bring those resources along. Things like usage base scheduling is another area we're looking at.

Dedicated nodes, so environments where, perhaps, you have different organizations that are doing things very differently, but you'd like to allocate n number of nodes for each one of these organizations and ensure that everything that they schedule actually goes on those nodes. So you're actually getting intra-cluster node isolation. So it's a pretty powerful thing that we're investing in. Priority preemption, that's another area, and then improving the quality of service enforcement. Today, it's really around the containers, but bring some of these things up to the pod level. And then Linux disk code is for tracking in isolation and then node allocatable enforcements. Those are areas we're going into. And if you're interested in learning more about these, please reach out to myself and David after this talk. So let's switch gears into identity and access management. So today, Kubernetes is 1.5, and we're going to be releasing 1.6 very shortly. So today, most of the identity and access management roles that are defined at the project level really apply to the Kubernetes cluster itself.

If you want finer grain control around things like namespaces, that was not available in Kubernetes 1.5 or earlier. So the idea with project level identity access management, you have these roles that you define within your project, and now you can associate them with your clusters. So for example, how we typically see people doing things is taking these roles that are predefined, like container developer, container admin, container viewer, and defining groups, and these groups that map to specific teams within your organization. And then specifying the right level of permission for a Container Engine Cluster associated with these groups that you're defining. So in this example, you have a developer group that maps to container developer IM role that maps to a dev project, and you set the specific permissions around that. So that was available today. In Kubernetes 1.6, we're introducing this concept of role based access control down to the namespace level. So David will come up on stage in a minute or so to walk you through a demo here.

But what he'll be showing you is basically the ability to have an admin come along, reach out to John who is on the blue team, and then basically provide John with access to the blue team namespace. Come along and Lisa, who is on the green team, ensure that Lisa only has read/write access to the green team namespace. And you can even take it one step further and say, well, I want Lisa to have actually view access into the blue team namespace. So as you can see, that level of granularity is actually improved in terms of what you can do in really connecting service accounts, and users, and roles to the underlying resources within a cluster. And eventually we want to take that even further. So without further ado, I will hand it back to David for a namespace access control demo. DAVID OPPENHEIMER: OK. Great. So in this demo, we're going to use three windows. Because as Alan mentioned, we're going to show how you can have an administrator, and then this blue user, and this green user, and they all have different access permissions.

So hopefully it will be clear enough when I'm jumping around between the windows. And also, one thing I wanted to mention is that the different users are going to be based on having different service accounts. It's a little tricky on a single machine to be logged in as multiple users at the same time. So what I'll show you is based on these users having different service accounts associated with them, but in the real world you can also just have them be based on users without having to create service accounts. So the first window here is going to be the admin. And they're going to run this Gcloud command to ask the Google Cloud Platform IAM service to create a service account for the blue dev. You don't have to worry about the details here. It's just saying create a service account and then fetch the key into a local file. We're going to do the same thing for the green team dev, create a service account called green team dev at such and such and such, and then fetch the key into a local file.

Then we're going to go over here into the blue team window– sorry, the windows aren't color coded, but I'll tell you what I'm switching to– and fetch configure kubectl to use the credentials that we just set up for the blue service account. This is all just setup stuff so far. Last bit of setup– we're going to go into the third window for the green user and fetch the data into the kubectl config file so that now the green user has the credentials for the green service account. So now we can actually start doing interesting things. So the first thing that we're going to do is to create a namespace that's called blue. And what you'll see is that the blue team dev shouldn't have any access to that namespace yet because we haven't explicitly granted permissions. So we're in the second window now, the blue window, and we do kubectl get pods, namespace blue, and you can see that the server gives an error saying that the blue user is not allowed to do this operation on the blue namespace.

But we can now go back to the admins window– oh sorry, I didn't make these larger. I probably should have asked about that earlier. Luckily you didn't miss too much. That might be too big– but anyway. So we go back to the admin window, and now we can do the fun part and give the blue user access to the blue namespace and using the role-based access control mechanism. So the role-based access control mechanism is based on the concept of cluster roles, and we have a number of predefined cluster roles in Container Engine, like admin, edit, view, and so on. This is just showing you the ones that the system comes with. And associated with each of these cluster roles is a set of permissions for each type of object in the system. So this is just like a dozen lines from the file. The file's much longer, but it doesn't fit on the screen. So I'm just showing you part of the definition of what the admin cluster role has access to. So you can see here, it has a list of resources and sub-resources associated with pods, and a set of verbs that are allowed for someone with the admin access to do on those resources, like create, delete, do a watch, and so on.

So that was a cluster role, in particular, the admin cluster role. And now what we're going to do is we're going to associate that admin cluster role with the blue user. And the way that's done is by creating a role binding object. This looks a little complicated, but it's not too bad. What it's saying here is that the user called the blue team dev is going to get the admin cluster role in the blue namespace. So that's what this is doing. It's binding the blue user, in particular their service account, because we're using service accounts. But conceptually, it's the blue user to the admin cluster role in the blue namespace. So now we ask the server to create that binding. And now we can go back to the window with the blue user, where previously, we saw that they didn't have access to the blue namespace. And now when they do kubectl get pods, they get no resources found. That's what we'd expect. We haven't created any pods yet, but you see they're not getting the permission denied error.

They're just being told that the resources are not available yet. And they can do other things. They can get services. We haven't created any services yet. That's not too exciting. They can run kubectl run which creates a deployment that starts up a Nginx server in the blue namespace. So now, like I said, you can see that they have permissions in the blue namespace. So now let's go back to the Admin user, and we're going to set up the permissions for a green user. So first, we create a green user– sorry, the green namespace, and then another role binding. This time the user is the green team dev, but it's the same permissions. It's the admin cluster role. It's being bound to the green namespace. So this is identical to the last role binding we saw, but instead of the blue user, it's the green user, and instead of the blue namespace, it's the green namespace. And so let's create that. And after we create that, before we go and look at what the green user can do, let's go back to the blue user and just verify that they can't do anything in this green namespace.

So we do kubectl get pods in the green namespace. You can see that that's forbidden. It gets services. They can't see that and they can't request to see all namespaces either. So the blue user, as we intended, doesn't have access in the green namespace. But we can go over to the third window, which is the one where we have the green user set up, and you can see that they have permission in the green namespace. It's get pods, namespace green. It says no resources found. We haven't created any resources, but this shows they have permission to look at the resources. They can't look in the blue namespace, because we didn't give them permission to look in the blue namespace, but they do have permission in the green namespace. They can run Nginx or do whatever they want inside the green namespace. So now we have the blue user with full permission in the blue namespace, the green user with full permission in the green namespace. And I'll show you one last piece, which is that view thing that Alan mentioned a minute ago.

Let's say hypothetically that the green user, in addition to having permissions to do whatever they want in their green namespace, also is supposed to have view permissions in the blue namespace. Maybe they're allowed to monitor the objects that are created by that blue user. And so what we can do is that– oh sorry, well, first we'll show that the admin user can see all the resources in all of the namespaces. So some of these are system resources at the bottom, but you can see the top two lines are the Nginx deployment that we set up that the blue user started and that the green user started. So now onto what I was just saying a second ago. We're going to create a third role binding. This is going to give the view permission. You can see down here, cluster roll view. It's going to give the view permission to the green team user, give them view permission on the blue namespace. So we create this role binding. We can go back to the window for the green user. Do get deployments, namespace blue, and you can see that this green user can view objects, in particular, the deployment objects in the blue namespace, but they can't do other mutations.

For example, this green user can't do this kubectl Delete and try to delete the deployment, because they only have view access to it. And so that's pretty much the end of the demo. Just showing again that you can give fairly fine grained access control to users on a namespace granularity. Giving them permissions to create, delete, view objects, and a bunch of different other operations based on what permissions you're trying to set up. So I'll hand it back to you, Alan. ALAN NAIM: Thank you. So putting it all together to summarize everything we've talked about today so far. Recommendation is break out your operating environment, in particular, production. It's a separate project. And then put your dev test staging in another project. And depending on the requirements for your teams, you could put them same cluster or separate dev test staging into different clusters. Namespaces are a great boundary, especially in Kubernetes 1.6 for having user level control. So for cases where you're running a cluster with multiple applications, you'll be able to associate roles with these underlying namespace objects.

Leverage groups to encapsulate the actual identity access management roles. You can create these groups that actually map back to your processes, and then that's really the benefit that they provide you. Leverage Kubernetes namespaces and assign resource quotas for cases where you have different departments, different requirements that are sharing resources, and potentially, you could have abuse. So leverage these quotas to actually avoid these situations from happening. And in cases where you are running heterogeneous applications within a cluster, where you're sharing cluster resources, really start thinking about using requests and limits. A scenario that not too many people really understand, but I think this is something definitely worth looking at and using in the future, actually, to provide the best quality of service for your applications as they run within the environments. And then take advantage of some of these objects, like initial resources, to help you better understand usage and utilization for your containers.

That being said, here's some links and resources. There's much more we can provide you, but the idea is– for those of you that are new to Kubernetes, there's some phenomenal training courses that are available online. And our documentation goes into pretty much a lot of detail as far as request limits and using those types of quality of service controls for your cluster. That being said, I will open it up to Q&A. And perhaps, David, you want to come back up, and we'll take your questions. AUDIENCE: Hi. So I had a quick question about the role-based access control that you had shown in the previous slide. I saw that you use service accounts. Does it also work with user accounts and Google Groups if you use that? DAVID OPPENHEIMER: Yeah. It works it works with user. It definitely works with user accounts. Like I said, the only reason we were using service accounts there was because it's hard to kind of log in from multiple users at the same time. AUDIENCE: And this is in 1.6, right?

And you were using it against the cluster, so the cluster is upgraded to 1.6? DAVID OPPENHEIMER: No. To be clear– that's a great question. I was running against a test cluster, like an internal test cluster that we have up. So it's not available yet. 1.6 will be rolled out sometime in the next month or so. AUDIENCE: OK. Thank you. AUDIENCE: For the instances that are running on best effort with overcommitment of resources, before something would be evicted, is there any type of notification that would happen for the workload to say, hey, prepare myself to go away? Any type of warning, or would it be an immediate eviction? DAVID OPPENHEIMER: No. For the memory eviction, right now there's no notification. It's the system immediately needs to reclaim the memory, and so you get evicted right away. There's other kinds of preemption type operations in Kubernetes, like when you're doing a rolling update and you're restarting your pod into a new container image version and things like that, where we do provide notice.

But this is kind of the one situation where there really isn't an opportunity to run some other code to notify your application, because there just isn't any memory left on the node, and so we kill it without notification. AUDIENCE: Hey. So you guys didn't set any limits on the amount of questions that can be asked, right? OK. DAVID OPPENHEIMER: Very good pun, yes. AUDIENCE: OK. So I just have a few. So the IAM stuff, it only works in GCP, right? DAVID OPPENHEIMER: Not exactly. So the fundamental role-based access control mechanism is part of open source Kubernetes. The tie-in into the Google Cloud Platform IAM, that's how we were using the GCP service accounts and stuff like that. That part is only on Google Cloud Platform, but all of the role-based access control mechanisms, the underlying stuff, is all in the open source. AUDIENCE: OK. So it already works in the AWS today? DAVID OPPENHEIMER: I don't know how connected– if people have done that last mile of plumbing to connect it to the AWS IAM mechanism.

We have a special interest group, Kubernetes special interest group for AWS, and you can ask a question there. I don't know the answer to the question, but in principle, it's possible, yeah. AUDIENCE: OK. Also, are there any plans on more granular authorization? So I know you guys do the namespace authorization. It seems like that's the lowest you go. Are there any plans on not allowing certain aspects of the API to be used? So say I don't want someone to create a storage set in a certain role, or I don't want them to create something without a health check, or I don't want them to open up a certain port– things along those lines? Or is that always going to be above the Kubernetes level? DAVID OPPENHEIMER: I don't have a good answer for that. Today the lowest level of granularity is the namespace and sort of what you can do in a namespace, but in the future, there might be those other possibilities. I'm not aware of discussions going on yet about that, but those do seem like reasonable use cases.

AUDIENCE: For the CPU limiting, is that done through some combination of CPU throttling or some sort of combination of CPU affinity and throttling? DAVID OPPENHEIMER: So the way it works under the covers is that the CPU request maps to CPU shares in the CFS scheduler in the Linux kernel. And the CPU limit maps to some other CPU– I don't remember the details. Something else in the kernel that throttles it. So it's using kernel CPU scheduling mechanisms to enforce that. AUDIENCE: OK. So you're not pinning it to a specific CPU. DAVID OPPENHEIMER: No. We don't have support yet for pinning containers to specific CPUs, and that's not the mechanism that's used there. AUDIENCE: Cool, thanks. DAVID OPPENHEIMER: Yeah, sure. AUDIENCE: In the beginning you talked about people splitting things up in terms of just like projects, or namespaces, or clusters. Some were suggesting in production people are running services that are– just from my experience– people are running services that are kind of similar.

So let's say I have a bunch of stateful sets. I have some things that are running the JVMs, so they need the JV memory, or I have some Redis clusters. Are people looking at splitting these things up just in terms of just namespaces? Or is there kind of an incentive to split this up by cluster? Because then you can optimize out of the parameter setting just by cluster instead of having do it by namespace. Are there any pros and cons with that? DAVID OPPENHEIMER: I don't know. Maybe Alan could talk. Do you want to take that, or do you want me to take that? ALAN NAIM: Yeah. It really depends on your application. But what we see out there is people are running these within their own namespace and sharing a particular cluster. There are cases, though, where the applications have certain data stringent requirements that really require true isolation from a networking perspective as well, and those are cases where people end up splitting them up into different clusters. You want to add to that?

DAVID OPPENHEIMER: Yeah. I agree. It kind of depends on what specific kind of isolation you're trying to achieve. If you wanted to say more about what you were trying to do, we could give you a recommendation. You can do it now, or you can do it after the talk. It's up to you. [MUSIC PLAYING]


Read the video

Your responsibility is to provide services for internal users. How do you run Google Container Engine (GKE) cluster(s) to enable multiple teams to run applications in a heterogeneous environment. Do you create a single cluster and partition with namespaces or do you create a cluster per team? For environments with multiple clusters, how do you federate access control? How do you ensure the right level of access for each cluster. This video will walk through some of the best practices associated with running GKE clusters in large team environments.

Missed the conference? Watch all the talks here:
Watch more talks about Infrastructure & Operations here:

Leave a Comment

Your email address will not be published. Required fields are marked *

1Code.Blog - Your #1 Code Blog