Kubernetized

Last time we've discussed what pods were, how they relate to containers and the way services expose them for consumption. We've learned pods alone weren't resilient - now we'll look at what types of workloads are supported and some of the use-cases they cover.

A frequently occurring misunderstanding needs prior clarification, though: Kubernetes does not change the applications it hosts. The software is still the same, verbatim as packaged in an OCI image. How it runs is therefore entirely up to its developer - Kubernetes won't make it any better or worse, for what that's worth. How it's being run and configured, on the other hand, is up to Kubernetes, which merely provides the platform underneath: by taking care it's started - wherever it should be running with all the required compute and storage resources at its disposal - and kept alive, running under constant surveillance with no manual intervention. That's what it does: it helps you run stuff. I promise I'll also tell you how - for now, though, let's stick to the "what" part of the iceberg.

At any rate, "stuff" comes in a number of varieties. You're given a set of generic tools to use. You'll be able to cover most use-cases and scenarios with these (or else roll your own). Remember you shouldn't be scheduling pods directly, should you have multiple and wish them to endure faults. Resilience is provided through so-called workload resources that manage pods, also taking care of re-scheduling if that's what keeping them running takes. Common characteristics include the ability to scale by changing the replica count denoting the number of pods to run and that of housekeeping - as a rule of thumb workloads manage their pods throughout their lifecycle, including removal upon their own deletion. (How these recognize their charges boils down to labels, which are a specific kind of identifying metadata - you'll see a lot of them and they therefore warrant a future article of their own.)

The resource intended to incorporate stateless applications is called Deployment. Stateless applications maintain no state directly and therefore require no stable storage or network identity - their replicas are equal as well as identical and requests might hit any of those. Persistent storage isn't normally part of this equation. Deployments are designed to ensure a suitable number of replicas of your application are running at any given point in time with no service disruption even during a version upgrade. They're special in the sense that they make use of another resource type named a ReplicaSet in order to implement this: actually your pods aren't directly managed by the Deployment but a ReplicaSet instead, which is, in turn, managed by the Deployment. A version upgrade carried out in the default manner is called a "rolling update" and will, beneath the hood, mean an automatic, simultaneous scaling of ReplicaSets in order to phase out pods running containers based on the previous version of the image packaging your application, introducing pods running containers based on the current version. The opposite is also supported by a rollback capability, which performs the same thing, only the other way 'round. For this purpose a number of ReplicaSet objects are kept as configurational placeholders in spite of their respective current replica counts of zero. (Historically - prior to the introduction of ReplicaSet and Deployment resources - another resource named ReplicationController was intended to serve this purpose. It did have its quirks: the above logic was not part of its implementation, rolling upgrades were actively orchestrated on the client side and had better run to completion without interruption... It may also be worth noting that while there's a separate resource targeted at stateful applications, a Deployment may also do with pre-provisoned persistent volumes and a Recreate strategy - to be elaborated on later - provided there's a only single replica at play.) Replicas are not colocated as long as the cluster has enough nodes, but there is no guarantee in place by default. (Affinities may be configured to influence scheduling decisions.) Typical applications managed via Deployment workloads include frontends and highly-scalable API servers.

A different species of animal entirely is a stateful application. These play specific roles implementing application tiers and maintain their own state on persistent storage - they definitely need to retain their identity and attached storage in a manner that transcends re-scheduling or upgrades lest they "lose their minds" and are rendered inoperable. The resource accommodating such applications by fulfilling these requirements is named Statefulset. The following holds true for every replica: their persistent volumes are theirs alone and their network identities are unique. The former are instantiated based on templates ("volumeClaimTemplates" to be more precise, and I'll tell you about them later) while the latter incorporate ordinals which are also honoured during startup or shutdown sequences. There is an additional requirement of a pre-defined Service - and a so-called "headless" one at that, serving a subdomain under which the backing pods get registered by name and address - that is part of the Statefulset's specification. (I'll be telling you more about Services too, just you wait!) A typical use-case is some kind of database.

A DaemonSet can be considered a variant of Deployment that is incapable of scaling, running a single pod (exactly one) on every node in the cluster. (It's not really this simple, but there's more to be told about architecture and scheduling before it's worth doubting.) It's also incapable of rolling upgrades, so it won't create ReplicaSets to manage its pods. It follows node fluctuation in the cluster by extending to new nodes or garbage-collecting pods lost owing to node removal. Lends itself to the purpose of cluster-wide networking, logging, monitoring or storage by making sure components run all across the cluster.

A Job is used to run pods in order to ensure an operation is seen through to completion a specific number of times, optionally involving parallelism. Unlike a ReplicaSet, they are designed for short-lived pods executing finite operations. Suitable for tasks like batch processing.

A CronJob resource is technically a scheduling mechanism for Jobs, supporting one-off as well as recurring executions of pods, capable of handling concurrency and suspension. Often seen implementing backup solutions, taking scheduled snapshots or performing similar actions.

There's still a lot to say, but for now I'd rather let this settle somewhat. I'll be back!

Kubernetized

Koncepts

Matters of state