Portal | Level: L2: Operations | Topics: Kubernetes Core, Kubernetes Networking, Node Lifecycle & Maintenance | Domain: Kubernetes

Kubernetes Under the Covers¶

A practical, internals-first learning guide (with diagrams, “what actually happens”, and drills).

Scope / versions: This describes how Kubernetes works in mainstream, production clusters today. Exact behavior varies by distribution, CNI, CSI, and Kubernetes version. Where it varies, this guide says so explicitly (no fairy tales).

How to use this guide (fast)¶

Skim Part A once to get the “mental model”.
Then pick one operation from Part C per session and do: 1) Read “Under the covers” 2) Run the “Watch it live” commands 3) Do the “Drill” 4) Run the “Cleanup”

Table of contents¶

Part A - The mental model
Part B - Core components and data flows
Part C - What happens when you do common operations
Part D - Troubleshooting map (symptom → likely layer)
Part E - Unknown unknowns (stuff that surprises smart sysadmins)
Appendix - Minimal lab manifests

Part A - The mental model¶

A1) Kubernetes is a distributed control system¶

Kubernetes is not “a container launcher”. It’s a declarative control plane that: 1. Accepts desired state via the API. 2. Stores it durably (etcd). 3. Runs control loops that continuously push reality toward that desired state. 4. Delegates the actual “Linux work” to agents on each node (kubelet + container runtime + CNI/CSI).

The core loop (visual)¶

   You (kubectl / API client)
           |
           v
     [ kube-apiserver ]
   authn/authz/admission
           |
           v
         [ etcd ]  <-- the committed desired state
           |
           v
   watchers / informers (caches)
           |
           v
 controllers + scheduler decide actions
           |
           v
     [ kubelet on node ]
   runtime (CRI) + CNI + CSI
           |
           v
   Linux primitives (namespaces/cgroups, routes, mounts, processes)

A2) Desired state vs observed state¶

Spec: what you want (desired state).
Status: what Kubernetes observes (current state).

Most “why is it stuck?” problems are spec says X but status can’t reach X.

A3) One sentence summary you should memorize¶

“Kubernetes is a set of controllers watching etcd and reconciling the world.”

Part B - Core components and data flows¶

B1) API server pipeline (what happens to every create/update/delete)¶

When you kubectl apply/create/delete, the request typically goes:

Authentication: who are you?
Authorization: are you allowed? (RBAC, etc.)
Admission (before persistence):
Mutating admission runs first (may change the object).
Validating admission runs after (may reject).
Persistence: object is stored in etcd with metadata like uid and resourceVersion.
Watch events: clients (controllers, scheduler, kubectl -w) get notified.

Key takeaway: Kubernetes “does stuff” after the API write lands in etcd.

Admission phases (visual)¶

Request -> authn -> authz -> [ MUTATING ] -> [ VALIDATING ] -> etcd

B2) etcd: the source of truth¶

etcd is a consistent KV store used by the control plane to store cluster state.
Controllers and scheduler watch the API server (which reads/writes etcd).

Practical implication: if etcd/API is unhealthy, nothing converges.

B3) Controllers: the engine of “make it so”¶

Controllers are loops that: - watch certain object types - compute what should exist - create/update/delete other objects accordingly

Examples: - Deployment controller creates ReplicaSets. - ReplicaSet controller creates Pods. - EndpointSlice controller creates/updates EndpointSlices for Services. - Node controller reacts to node health.

B4) Scheduler: “where should this Pod run?”¶

The scheduler: - watches for Pods without a node assignment - picks a node using plugins (filtering + scoring + additional phases) - writes the binding (spec.nodeName) back through the API

B5) kubelet: “make Pods on this node real”¶

kubelet: - watches Pods bound to its node - calls the container runtime via CRI - coordinates volume setup (with CSI / host volume types) - triggers networking setup through CNI (via the runtime) - updates Pod status and posts events

B6) CRI / container runtime (containerd, CRI-O, etc.)¶

Kubernetes talks to runtimes through the Container Runtime Interface (CRI).
kubelet typically asks the runtime to:
create a Pod sandbox (network namespace and related setup)
then create/start containers within that sandbox

Many setups use a “pause” (sandbox) image, but the exact mechanics can vary by runtime/config.

B7) CNI: Pod networking¶

CNI plugins commonly: - create interfaces (veth), assign Pod IPs - program routes - implement NetworkPolicy (depends on plugin) - optionally handle Service routing via eBPF (plugin-dependent)

B8) kube-proxy / eBPF dataplanes: Service traffic¶

Traditional Kubernetes uses kube-proxy on each node to program rules to route Service traffic to backends.
Some CNIs replace kube-proxy behavior with eBPF.

EndpointSlices are the common “source of truth” for backend sets.

B9) CSI: storage¶

CSI involves two broad sides: - Controller side: provisioning, attach/detach (often control-plane pods) - Node side: mount/unmount on the node

Part C - What happens when you do common operations¶

C0) Baseline: What to watch live (use this constantly)¶

Open 2-4 terminals:

T1: watch the object

kubectl get pods -A -w

T2: watch events (the story)

kubectl get events -A --sort-by=.metadata.creationTimestamp -w

T3: describe when stuck

kubectl describe pod -n <ns> <pod>

T4 (node-level): kubelet On the node (or via SSH):

sudo journalctl -u kubelet -f

If you use containerd:

sudo journalctl -u containerd -f

If you only remember one debugging rule: Events + kubelet logs.

C1) Operation: Create a Pod (`kubectl apply -f pod.yaml`)¶

Under the covers (sequence)¶

kubectl
  |
  | 1) POST/PUT Pod to API server
  v
kube-apiserver
  | 2) authn/authz
  | 3) admission (mutating -> validating)
  | 4) persist to etcd
  v
etcd  (Pod now exists as desired state)
  |
  | 5) scheduler sees unscheduled Pod
  v
kube-scheduler
  | 6) filter/score nodes
  | 7) bind Pod to node (spec.nodeName)
  v
kube-apiserver/etcd
  |
  | 8) kubelet on chosen node sees Pod
  v
kubelet
  | 9) prepare volumes
  | 10) CRI RunPodSandbox (network namespace)
  | 11) CNI ADD (Pod IP + routes)
  | 12) pull images
  | 13) run init containers (if any)
  | 14) start app containers
  | 15) update PodStatus + events
  v
Pod becomes Running; readiness gates decide Ready/NotReady

The “why it gets stuck” hotspots¶

Pending: scheduler can’t find a node (resources, taints, affinities, volumes).
ContainerCreating: volume mount or CNI problem.
ImagePullBackOff: registry/auth/DNS/network.
CrashLoopBackOff: app exits, liveness fails, bad command/args/env.

Drill (do this with a simple Pod)¶

Apply the Pod.
Watch events and identify which component emitted each event:
Scheduled → scheduler
Pulling/Started → kubelet
Explain the difference between spec.nodeName and Pod IP.

Cleanup¶

kubectl delete -f pod.yaml

C2) Operation: Create a Deployment (Pods appear “by magic”)¶

Under the covers¶

When you apply a Deployment, you are creating a controller input, not Pods directly.

Deployment created
  |
  v
Deployment controller -> creates ReplicaSet
  |
  v
ReplicaSet controller -> creates Pods
  |
  v
Scheduler -> binds each Pod
  |
  v
Kubelet -> runs each Pod

Why this matters¶

If you delete a Pod under a Deployment, it comes back (ReplicaSet notices desired replicas).
Scaling is just changing the desired replica count; controller does the rest.

Drill¶

Create a Deployment with 2 replicas.
Delete one Pod.
Observe that the ReplicaSet creates a replacement.

Cleanup¶

kubectl delete deployment <name> -n <ns>

C3) Operation: Scale a Deployment (`kubectl scale`)¶

Under the covers¶

You update spec.replicas on the Deployment (API write).
Deployment controller updates ReplicaSet desired replicas.
ReplicaSet controller creates/deletes Pods to match.
Scheduler/kubelet do their normal job for new Pods.

Drill¶

Scale from 1 → 5.
In events, count how many Pods were created and scheduled.

Cleanup¶

Scale back down or delete the Deployment.

C4) Operation: Rolling update (`kubectl set image` / apply new template)¶

Under the covers¶

A rolling update is a controlled replacement of Pods: - Deployment template changes (new image tag, env, etc.) - Deployment controller creates a new ReplicaSet - It increases new RS replicas and decreases old RS replicas according to: - maxUnavailable - maxSurge - Readiness gates determine when a new Pod counts as “available”

Drill¶

Change the image to a new version.
Watch ReplicaSets:
```
kubectl get rs -n <ns>
```
Explain why you temporarily have extra Pods (surge) or fewer (unavailable).

Cleanup¶

Rollback or delete Deployment:

kubectl rollout undo deployment/<name> -n <ns>

C5) Operation: Delete a Pod / Deployment (`kubectl delete`)¶

Under the covers¶

Delete is not always “kill immediately”.

API server marks object with deletionTimestamp (and increments generation/resourceVersion).
Finalizers (if any) must clear before actual removal.
For Pods:
kubelet receives a “stop” directive
sends TERM to containers, waits terminationGracePeriodSeconds
then SIGKILL if needed
tears down sandbox + CNI DEL for networking
unmounts volumes as appropriate

Drill¶

Delete a Pod with a sleep loop.
Observe termination grace: does it stop instantly or wait?

Cleanup¶

Already deleted.

C6) Operation: Exec into a container (`kubectl exec`)¶

Under the covers (high level)¶

kubectl asks API server for an exec session.
API server upgrades to a streaming connection (SPDY/WebSocket depending on setup).
API server proxies the stream to the kubelet on the node.
kubelet asks the runtime to create an exec process in the container’s namespaces/cgroups.

Key implication: exec is a control plane → kubelet → runtime path, not “SSH”.

Drill¶

Exec into a Pod and run ps.
Explain why you see only processes in that container/pod namespaces.

Cleanup¶

Exit the shell.

C7) Operation: Stream logs (`kubectl logs -f`)¶

Under the covers¶

kubelet exposes container logs (runtime-dependent storage path).
API server proxies kubectl logs request to kubelet.
kubelet streams logs back.

Important: log retention depends on node disk + runtime log rotation settings.

Drill¶

Run a Pod that prints a counter.
Stream logs and kill/restart the container.
Observe how “previous” logs work:
```
kubectl logs <pod> --previous
```

Cleanup¶

Delete Pod.

C8) Operation: Create a Service (ClusterIP) and route traffic¶

Under the covers¶

When you create a Service: - API stores Service object. - EndpointSlice controller creates/updates EndpointSlices matching the Service selector. - kube-proxy (or eBPF dataplane) uses EndpointSlices to program routing rules. - CoreDNS provides name resolution for the Service.

Visual: Service traffic path (classic kube-proxy model)¶

Client Pod -> Service ClusterIP:port
      |
      v
node routing (kube-proxy rules)
      |
      v
one backend PodIP:targetPort (selected from EndpointSlices)

Drill¶

Create Deployment + Service.
Verify EndpointSlices:
```
kubectl get endpointslices -n <ns>
```
Curl the Service and then curl a specific Pod IP. Explain the difference.

Cleanup¶

kubectl delete svc <name> -n <ns>

C9) Operation: Update a ConfigMap used by a Pod¶

Under the covers (gotcha-heavy)¶

Two common consumption patterns:

1) Env var from ConfigMap - ConfigMap change does not automatically update running container env. - You need a restart/rollout.

2) Volume mount from ConfigMap - kubelet updates projected volume content (eventually) on the node. - app must re-read files to see changes.

Drill¶

Mount a ConfigMap as a file and watch it change.
Then use env-from and notice it doesn’t update without restart.

Cleanup¶

Delete the ConfigMap and workload.

C10) Operation: Create a PVC (storage)¶

Under the covers (typical dynamic provisioning)¶

You create a PVC.
A controller provisions a PV (via CSI provisioner) if StorageClass supports dynamic provisioning.
PV binds to PVC.
When a Pod uses the PVC:
controller side may attach a volume to the node (cloud/provider dependent)
node plugin mounts it into the Pod

Drill¶

Create PVC, then a Pod that mounts it.
Observe events for attach/mount.

Cleanup¶

Delete Pod, then PVC (and possibly PV depending on reclaim policy):

kubectl delete pod <pod> -n <ns>
kubectl delete pvc <pvc> -n <ns>

C11) Operation: Cordon/Drain a Node (eviction + reschedule)¶

Under the covers¶

cordon marks node unschedulable.
drain:
evicts Pods (through API eviction subresource)
respects PodDisruptionBudgets
deletes Pods (controllers recreate elsewhere)

Drill¶

Cordon a node.
Drain it.
Watch Pods reschedule.

Cleanup¶

Uncordon:

kubectl uncordon <node>

C12) Operation: Apply vs Replace vs Patch (why “apply” is special)¶

Under the covers¶

kubectl apply uses a three-way merge concept (client-side and/or server-side apply depending on usage).
It tracks field ownership and tries to only change what you “own”.
This is why apply plays better with other controllers editing objects.

Drill¶

Apply a manifest.
Edit the live object with kubectl edit.
Apply again and see what changes, and what gets preserved.

Cleanup¶

Delete the objects you created.

Part D - Troubleshooting map (symptom → likely layer)¶

D1) Pod stuck in Pending¶

Most common layers: - Scheduler constraints (resources/taints/affinity) - Volume constraints (PV node affinity/zone) Check: - kubectl describe pod ... → Events - kubectl get nodes -o wide

D2) Pod stuck in ContainerCreating¶

Most common layers: - CNI networking failing - CSI mount failing Check: - Pod events for FailedMount / CNI errors - kubelet logs on node

D3) ImagePullBackOff¶

Layers: - Registry auth - DNS/network path to registry - Wrong image name/tag Check: - Events show exact pull error - node network/DNS

D4) CrashLoopBackOff¶

Layers: - Your app exits - probe failure - wrong command/args/env Check: - kubectl logs - kubectl describe pod (probe failures)

D5) Service exists but no traffic¶

Layers: - selector mismatch → no endpoints - readiness not satisfied → endpoints not “ready” - NetworkPolicy blocking - kube-proxy / dataplane issue Check: - EndpointSlices - readiness conditions - policy rules

Part E - Unknown unknowns (the “why is this weird?” list)¶

A Pod is not a process. It’s a bundle of namespaces + cgroups + containers.
Most things are eventually consistent (controllers catch up via watch).
Status is not spec. You can “ask” for 3 replicas and only have 1 running.
ConfigMap env vars don’t live-update (needs restart).
Delete is often two-phase (deletionTimestamp + finalizers).
Services don’t “own” traffic routing; kube-proxy/eBPF does.
“Ready” is an application contract (readiness probes gate traffic).
A Deployment is a policy object controlling ReplicaSets, not Pods.
NetworkPolicy behavior depends on the CNI (some enforce, some don’t).
kubectl is a client, not a “cluster command runner”.

Appendix - Minimal lab manifests¶

Put these in a directory and apply them. They’re intentionally tiny.

A) namespace¶

apiVersion: v1
kind: Namespace
metadata:
  name: demo

B) pod (simple)¶

apiVersion: v1
kind: Pod
metadata:
  name: hello-pod
  namespace: demo
spec:
  containers:
  - name: web
    image: nginx:stable
    ports:
    - containerPort: 80

C) deployment + service¶

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-deploy
  namespace: demo
spec:
  replicas: 2
  selector:
    matchLabels:
      app: hello
  template:
    metadata:
      labels:
        app: hello
    spec:
      containers:
      - name: web
        image: nginx:stable
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: hello-svc
  namespace: demo
spec:
  selector:
    app: hello
  ports:
  - port: 80
    targetPort: 80

D) cleanup-all for the demo namespace¶

kubectl delete ns demo

Sources (primary, official)¶

(These are here so you can verify details and avoid cargo-culting.)

Kubernetes Admission Control:
- https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/
- https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/

EndpointSlices:
- https://kubernetes.io/docs/concepts/services-networking/endpoint-slices/

Scheduling framework & scheduler phases:
- https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/
- https://kubernetes.io/docs/reference/scheduling/config/
- https://kubernetes.io/docs/reference/config-api/kube-scheduler-config.v1/

CRI and Pod sandbox concept:
- https://kubernetes.io/blog/2016/12/container-runtime-interface-cri-in-kubernetes/

Prerequisites¶

Skillcheck: Kubernetes (Assessment, L1)

Case Study: Canary Deploy Routing to Wrong Backend — Ingress Misconfigured (Case Study, L2) — Kubernetes Core, Kubernetes Networking
Case Study: DaemonSet Blocks Eviction (Case Study, L2) — Kubernetes Core, Node Lifecycle & Maintenance
Case Study: Service No Endpoints (Case Study, L1) — Kubernetes Core, Kubernetes Networking
Kubernetes Exercises (Quest Ladder) (CLI) (Exercise Set, L1) — Kubernetes Core, Kubernetes Networking
Kubernetes Ops (Production) (Topic Pack, L2) — Kubernetes Networking, Node Lifecycle & Maintenance
Track: Kubernetes Core (Reference, L1) — Kubernetes Core, Kubernetes Networking
API Gateways & Ingress (Topic Pack, L2) — Kubernetes Networking
Adversarial Interview Gauntlet (30 sequences) (Scenario, L2) — Kubernetes Core
Case Study: Alert Storm — Flapping Health Checks (Case Study, L2) — Kubernetes Core
Case Study: CNI Broken After Restart (Case Study, L2) — Kubernetes Networking

Kubernetes Under the Covers¶

How to use this guide (fast)¶

Table of contents¶

Part A - The mental model¶

A1) Kubernetes is a distributed control system¶

The core loop (visual)¶

A2) Desired state vs observed state¶

A3) One sentence summary you should memorize¶

Part B - Core components and data flows¶

B1) API server pipeline (what happens to every create/update/delete)¶

Admission phases (visual)¶

B2) etcd: the source of truth¶

B3) Controllers: the engine of “make it so”¶

B4) Scheduler: “where should this Pod run?”¶

B5) kubelet: “make Pods on this node real”¶

B6) CRI / container runtime (containerd, CRI-O, etc.)¶

B7) CNI: Pod networking¶

B8) kube-proxy / eBPF dataplanes: Service traffic¶

B9) CSI: storage¶

Part C - What happens when you do common operations¶

C0) Baseline: What to watch live (use this constantly)¶

C1) Operation: Create a Pod (kubectl apply -f pod.yaml)¶

Under the covers (sequence)¶

The “why it gets stuck” hotspots¶

Drill (do this with a simple Pod)¶

Cleanup¶

C2) Operation: Create a Deployment (Pods appear “by magic”)¶

Under the covers¶

Why this matters¶

Drill¶

Cleanup¶

C3) Operation: Scale a Deployment (kubectl scale)¶

Under the covers¶

Drill¶

Cleanup¶

C4) Operation: Rolling update (kubectl set image / apply new template)¶

Under the covers¶

Drill¶

Cleanup¶

C5) Operation: Delete a Pod / Deployment (kubectl delete)¶

Under the covers¶

Drill¶

Cleanup¶

C6) Operation: Exec into a container (kubectl exec)¶

Under the covers (high level)¶

Drill¶

Cleanup¶

C7) Operation: Stream logs (kubectl logs -f)¶

Under the covers¶

Drill¶

Cleanup¶

C8) Operation: Create a Service (ClusterIP) and route traffic¶

Under the covers¶

Visual: Service traffic path (classic kube-proxy model)¶

Drill¶

Cleanup¶

C9) Operation: Update a ConfigMap used by a Pod¶

Under the covers (gotcha-heavy)¶

Drill¶

Cleanup¶

C10) Operation: Create a PVC (storage)¶

Under the covers (typical dynamic provisioning)¶

Drill¶

Cleanup¶

C11) Operation: Cordon/Drain a Node (eviction + reschedule)¶

Under the covers¶

Drill¶

Cleanup¶

C12) Operation: Apply vs Replace vs Patch (why “apply” is special)¶

Under the covers¶

Drill¶

Cleanup¶

Part D - Troubleshooting map (symptom → likely layer)¶

D1) Pod stuck in Pending¶

D2) Pod stuck in ContainerCreating¶

D3) ImagePullBackOff¶

D4) CrashLoopBackOff¶

D5) Service exists but no traffic¶

Part E - Unknown unknowns (the “why is this weird?” list)¶

Appendix - Minimal lab manifests¶

C1) Operation: Create a Pod (`kubectl apply -f pod.yaml`)¶

C3) Operation: Scale a Deployment (`kubectl scale`)¶

C4) Operation: Rolling update (`kubectl set image` / apply new template)¶

C5) Operation: Delete a Pod / Deployment (`kubectl delete`)¶

C6) Operation: Exec into a container (`kubectl exec`)¶

C7) Operation: Stream logs (`kubectl logs -f`)¶