Communication is Key: the Kubernetes Way

Communication is important indeed. In all aspects. This includes Kubernetes too. This post delves into the major concepts required for understanding how networking works in Kubernetes. Read on!

Did you know that a cat has three names? Uh No?

Okay, there's this one poem by TS Elliot called "The Naming of Cats". You'd know if you read it. Well, for now, I'll tell you about it. A cat has three names; a common name used daily and known to all, a more dignified name, known only to the cat's close ones, and a "private" name known only to the cat.

But you know that poets aren't that straight forward and you've got to read between the lines ....yada yada yada. And this poem here tells how people have different " personas" or behaviors in different settings. How people behave in public, with close friends and family, and when they're alone. There's more to it but let's not go there.

That's all for the philosophy/psychology class today folks! Trust me, you are not on the wrong site. Yes, it's a blog that talks about technology and yes, we are talking Kubernetes.

And if you have been with me for quite a while now, you must know that these random digressions are not in vain. I'll always circle back somehow.

So what's the connection with Kubernetes, and the Cat's three names? You'll have to read this post for that (And the previous posts too please!)

It's called "Networking"

You meet someone new... What do you tell them about yourself first? Your name! Your name is a way that others can communicate with you and you specifically. It belongs to you. It's a part of your identity.

Now if you roughly translate the "name" entity into the world of technology. What do you get?.... IP addresses! Most networking concepts revolve around IP addresses. And well... I'm not here to tell you how important networking is.

So it's kind of natural that there will be some provisions made for networking within the Kubernetes ecosystem.

You got a cluster that's running multiple pods. The pods might want to communicate with each other. Or maybe an external source wants to connect with a pod. No wait, don't go that far. A pod is running containers within it. How are the containers talking to each other? And this is INSIDE the pod.

Let me summarise if you want to list down the communications taking place on a cluster, we've got

Within a single Pod between containers
Between two Pods
Pod and the rest of the world

And what about the cat and the names? The cat is actually a pod and the cat's name represents the different ways pods can communicate.

Cat = Pod
Cat's Name = Way a Pod Communicates

Here's how it connects...

The Cat's Private Name = Communication within a Pod

A cat has a private name known only to the cat. No one else.

That's how containers within a pod communicate with each other. via the "private name". What's that you may ask? It's "localhost"

Remember we told you a pod is like a miniature version of a full-blown computer (or VM). So a single pod has its own networking in place and all its containers share the same IP and MAC address. So when a container wants to talk to another it simply communicates via localhost along with a port number.

So the scope of the name "localhost" is only limited to the pod. And you can say that it's the pod's "private name" just like our cats!

The Cat's Dignified Name = Communications between Pods

A cat has a dignified name known only to the cat's close friends and family.

If you look at a cluster in another light, it's just a large bunch of pods running together. Just like a family. And if you'll probably relate here if you've ever had a name that only your close ones called you. In this case, this name is the pod's IP address. And the "close ones" are all the other pods.
So simply put, a pod gets an IP address at creation and any other pod can communicate with it using this IP. That means the IP address is accessible across the cluster. The node that the pod runs on doesn't matter!

You should probably know this. Pods on a cluster can communicate with each other using this IP address irrespective of which node they're running on. But in the case of system daemons, or maybe even the kubelet, these guys can only communicate with the pods running on the same node as they are. So better remember that.

But wait, did you know that the IP address will die with the pod. That's a bummer! You see, if a pod suddenly errors out and "dies", all the other pods that were depending on this pod will be affected. They need to be informed in some way

That's half the problem. In the case of deployment workloads, the pods are dynamically created. So that means there's a high chance that the pod IP addresses will keep changing regularly. You can't keep informing the other dependant pods.

The Cat's Dignified Name = External communications

Services

So what do we do? We briefly talked about the problem mentioned above in our previous blogs. And there we introduced the concept of a "Service"

With a service, you get an IP address for your pod but with no strings attached. So even if the pod dies, the service stays. If another pod comes up in its place, the service is simply attached to it.

There are a few intricacies here. You get the Service resource in three "flavors". The first one is pretty straightforward

ClusterIP

When you create a service, it creates a service of the ClusterIP type by default. You get a ...err...IP address. But it's only accessible within the cluster. Outside the cluster, this IP has no meaning. So it's also kind of like the cat's dignified name. Something that's known only to the close ones.

.....But maybe you want your pod to communicate with the world. That's where the other two types of services come in.

NodePort Service Type

A cat has a public name that is used daily. It is known to everybody. Anyone who doesn't know the cat very well can use this name to call it.

Translate this into our context. An external entity can communicate with a pod running on the cluster using this "common name". What's this "public" name you might ask?

The first is the NodePort type. This service type will expose your pod to a particular port (or port range) along with the node's IP. (Well hello! A node also has an IP address)

So the pod's common name here becomes a combination of:

The IP address of the node it runs on + Port number assigned to it

Remember it's one port to one pod (or sometimes a set of pods) deal. So let's say you have a microservice and you're running it in a pod. You get a single port for that pod.

So NodePort is simple. Right? But it's not really a good idea to use it in production systems. Why? Tell me, do you really want to keep track of which ports are used for which pod?

I mean if you have a single microservice it's no problem. But what if you got a thousand? There'll be around a thousand ports that need managing.

Or, maybe say, your microservice is running in replica pods on different nodes, you'll also need to choose which node to send your traffic to. (Because there's the Node's IP that you have to use). Are you going to monitor which pod is overloaded or under-loaded? Maybe it is a bit too much huh?

That's where the next service type comes in.

LoadBalancer

Let' get real over here people. Nobody these days will venture out of the cloud. It's just got so convenient ever since all major cloud providers have started supporting Kubernetes.

That's why you got the LoadBalancer service type. And as the name suggests, it uses a load balancer from the cloud provider. In the end, you get a custom IP address for each of your microservice since Kubernetes has created a network load balancer for you on your cloud provider!

So let's say even you have a microservice running in multiple pods on different nodes, the load balancer is going to route traffic to the appropriate pod and you don't have to worry at all.

Ah, but there's a little "gotcha" over here. It's still a one-to-one mapping. When you define one LoadBalancer service type, you create one load balancer on your cloud account. Now if you have a lot of services running then you are going to have a ton of load balancers. Well, there's no problem with that, it's just going to cost you. A lot!

So how do we solve this?

Ingress and Ingress Controllers

Folks, there's a new resource type to handle all your routing for you. Ingress.

In short. One IP. Multiple Services.

Now what happens here is you have a set of rules.

Imagine a post office that receives packages every day and each package is sent to the right house based on the address on the package.

Just like an address is a way to determine where a package goes. An ingress contains a set of rules that define how traffic should be routed.

An example rule can be something like
route all traffic on the path "/mypathA" to the pod running microservice A and all traffic on "/myPathB" goes to service B.

The post office acts as a central location for distributing packages. In the same way, you have a single load balancer, mostly on the application layer that acts as a single entry point for all traffic that comes into the cluster. So in a way, you have a single IP address, or even a domain, which receives all the traffic but you can access multiple services from it

Ingress Controllers

Ah but yes, simply defining an ingress won't do. It comes in a combination with the Ingress Controller. Ingress is an abstract concept which defines traffic rules for routing and the ingress controller is the actual implementation of these rules.

In simple words

Ingress is the rule and the Ingress controller follows it.

So if the ingress says, "don't allow traffic from 'www.fishy-site.com' the ingress controller follows it. No questions asked. Implementation-wise, the ingress controller will just be a pod running on your cluster.

There are a number of Ingress controllers supported in Kubernetes, the most popular being nginx which has direct support. Different cloud providers have their own controllers like the ALB ingress controller in AWS and GCE ingress controller in GCP. But there are also other players like Traefik, Kong, Istio, Envoy, and a lot more.

In Summary

Well, that was a lot to take in! And we hope you could sail through that ocean of information. (You read the blog didn't you). Of course, we can't really go into the nitty-gritty of it all. It's a blog, not a novel.

But yes, that's a major gist of how the "networking" aspect works in Kubernetes. Because it's all about communication.

Stay tuned for more posts in this series. Till then... Ciao!