What Happens When You `kubectl apply`

lesson
kubernetes-api-server
etcd
scheduler
kubelet
container-runtime
networking
probes ---# What Happens When You kubectl apply

Topics: Kubernetes API server, etcd, scheduler, kubelet, container runtime, networking, probes Level: L1–L2 (Foundations → Operations) Time: 60–90 minutes Prerequisites: None (Kubernetes concepts explained as we go)

The Mission¶

You type kubectl apply -f deployment.yaml and press Enter. Kubernetes says:

deployment.apps/myapp created

Sixty seconds later, three pods are running, each with a unique IP, health checks passing, and traffic flowing. In those 60 seconds, at least 7 different components collaborated through a pattern borrowed from industrial control theory. None of them talked to each other directly — they all watched a shared database and reacted to changes.

This lesson follows kubectl apply from YAML to running pod through every component, explaining what each one does and how they coordinate.

The Architecture in 30 Seconds¶

                    ┌─────────────┐
    kubectl ──────→ │  API Server  │ ←──── Controllers
                    └──────┬──────┘       (watch + react)
                           │
                     ┌─────┴─────┐
                     │    etcd    │        (source of truth)
                     └───────────┘
                           ↑
         ┌─────────────────┼──────────────┐
         │                 │              │
    ┌────┴─────┐    ┌──────┴──┐    ┌──────┴──┐
    │ Scheduler │    │ kubelet │    │ kubelet │    (per-node)
    └──────────┘    └────┬────┘    └────┬────┘
                         │              │
                    ┌────┴────┐   ┌─────┴────┐
                    │ containerd   │ containerd   (container runtime)
                    └─────────┘   └──────────┘

Everything works through the control loop pattern: components watch the API server for changes, compare desired state to actual state, and take action to close the gap. Nobody gives orders — everyone reacts to the shared truth in etcd.

Name Origin: Kubernetes is Greek for "helmsman" or "pilot." The logo is a ship's wheel with 7 spokes (a nod to the 7 original Google founders involved). The project's internal codename was "Seven" — a reference to Seven of Nine from Star Trek, which was itself a reference to the Borg. Google's internal predecessor to Kubernetes was literally called Borg. The naming wasn't subtle.

Trivia: "k8s" is a numeronym — 8 letters between "k" and "s." Same pattern as "i18n" (internationalization) and "l10n" (localization). It was adopted because "kubernetes" is long to type and hard to spell.

Step 1: kubectl Sends YAML to the API Server¶

When you run kubectl apply -f deployment.yaml, kubectl:

Reads the YAML file
Converts it to JSON (the API server speaks JSON, not YAML)
Sends an HTTP request to the API server

# See what kubectl actually sends (dry-run, no changes)
kubectl apply -f deployment.yaml --dry-run=server -o yaml

# See the raw HTTP request
kubectl apply -f deployment.yaml -v=8
# → I0322 14:23:01 round_trippers.go:463]
#   POST https://api-server:6443/apis/apps/v1/namespaces/default/deployments
#   Request Body: {"apiVersion":"apps/v1","kind":"Deployment",...}

Under the Hood: Kubernetes chose YAML for human-readability and comment support. JSON doesn't support comments, which makes it terrible for configuration that humans edit. But the API server internally works entirely with JSON — your YAML is converted at the kubectl layer before it hits the wire.

Step 2: The API Server Validates and Stores¶

The API server is the only component that talks to etcd. Everything else goes through it.

When the Deployment arrives:

Request arrives
  → Authentication: who is this? (certificates, tokens, OIDC)
  → Authorization: can they do this? (RBAC check)
  → Admission control:
      → Mutating webhooks: modify the request (inject sidecars, add labels)
      → Validating webhooks: reject bad requests (policy enforcement)
  → Schema validation: does the YAML match the Deployment spec?
  → Write to etcd: store the desired state
  → Return 201 Created to kubectl

Name Origin: etcd stands for "distributed /etc" — the /etc directory where Unix stores configuration files, plus "d" for distributed. It's a distributed key-value store that provides strong consistency (Raft consensus). Every Kubernetes object — every Pod, Service, ConfigMap, Secret — lives in etcd.

At this point, nothing has happened yet. No pod is running. The Deployment object exists in etcd as desired state. The system needs to make reality match.

Step 3: The Deployment Controller Creates a ReplicaSet¶

The Deployment controller is one of many controllers running in the kube-controller-manager. It watches the API server for Deployment objects.

Deployment controller sees: new Deployment "myapp" with replicas: 3
Deployment controller creates: ReplicaSet "myapp-7d8f9c4b5f" with replicas: 3

Why a ReplicaSet and not Pods directly? Because the Deployment manages rolling updates. When you change the image tag, the Deployment creates a new ReplicaSet and scales it up while scaling the old one down. The ReplicaSet is the unit of "this exact version with this exact config."

Step 4: The ReplicaSet Controller Creates Pods¶

The ReplicaSet controller watches ReplicaSets. It sees a new one with replicas: 3 and 0 matching pods. It creates 3 Pod objects in etcd.

ReplicaSet controller sees: ReplicaSet wants 3 pods, has 0
ReplicaSet controller creates: Pod myapp-7d8f9c4b5f-abc12
                               Pod myapp-7d8f9c4b5f-def34
                               Pod myapp-7d8f9c4b5f-ghi56

These Pod objects exist in etcd with spec.nodeName: "" — they haven't been assigned to a node yet. They're in Pending state.

Mental Model: Think of Kubernetes like a hiring pipeline. The Deployment is the job posting ("we need 3 engineers"). The ReplicaSet is a specific batch of hires ("these 3, with this exact job description"). The Pods are individual hires. The Scheduler is HR assigning them to offices. The kubelet is the office manager making sure they show up.

Step 5: The Scheduler Assigns Nodes¶

The scheduler watches for Pods with no nodeName. For each unassigned Pod, it runs a two-phase algorithm:

Phase 1 — Filtering: Which nodes can run this Pod?

Does the node have enough CPU and memory (based on requests)?
Does it match any nodeSelector or nodeAffinity rules?
Do any taints on the node prevent this Pod (unless the Pod tolerates them)?
Is the node healthy and schedulable?

Phase 2 — Scoring: Of the eligible nodes, which is best?

Spread pods across zones/nodes (topology spreading)
Prefer nodes with less resource pressure
Co-locate or separate from specific pods (affinity/anti-affinity)
Prefer nodes that already have the image cached

The scheduler writes spec.nodeName on the Pod. This is the only thing it does — it doesn't start anything.

# See scheduling decisions
kubectl get events --sort-by=.metadata.creationTimestamp
# → Successfully assigned default/myapp-7d8f9c4b5f-abc12 to node-2

# Why is a pod stuck in Pending?
kubectl describe pod myapp-xxx
# → Events:
# →   Warning  FailedScheduling  0/3 nodes are available:
# →   3 Insufficient memory

Step 6: The Kubelet Starts the Container¶

The kubelet runs on every node. It watches the API server for Pods assigned to its node.

When it sees a new Pod for its node:

Pull the image (if not cached locally)
Create the pod sandbox — a network namespace shared by all containers in the pod
Call the CNI plugin — sets up networking (veth pair, IP address, routes)
Mount volumes — PVCs, ConfigMaps, Secrets, emptyDir
Run init containers — sequential, each must complete before the next starts
Run app containers — parallel by default
Start health probes — readiness and liveness checks

# Watch the kubelet's progress
kubectl get events -w
# → Pulling image "myapp:v1"
# → Successfully pulled image "myapp:v1" in 3.2s
# → Created container myapp
# → Started container myapp

The kubelet doesn't directly create containers. It calls the Container Runtime Interface (CRI) — usually containerd — which calls the low-level runtime (runc) to create the namespaced, cgroup-limited process.

kubelet → containerd (CRI) → runc → your process

Under the Hood: runc is the OCI-compliant runtime that actually creates the container. It calls clone() with namespace flags, sets up cgroups, mounts the filesystem (OverlayFS), drops capabilities, applies seccomp filters, then calls execve() with your entrypoint. It does this in about 100 milliseconds.

Step 7: Networking — The Pod Gets an IP¶

The CNI (Container Network Interface) plugin gives the Pod its own IP address. Unlike Docker's port-mapping model, Kubernetes mandates flat networking: every Pod gets a routable IP, and all Pods can reach each other without NAT.

# See pod IPs
kubectl get pods -o wide
# → NAME                     IP           NODE
# → myapp-7d8f9c4b5f-abc12  10.244.1.15  node-2
# → myapp-7d8f9c4b5f-def34  10.244.2.23  node-3
# → myapp-7d8f9c4b5f-ghi56  10.244.1.16  node-2

The Service provides a stable IP that load-balances across these Pod IPs:

kubectl get svc myapp
# → NAME    TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)
# → myapp   ClusterIP   10.96.45.12   <none>        80/TCP

This ClusterIP (10.96.45.12) is virtual — nothing actually listens on it. Instead, kube-proxy programs iptables (or IPVS/eBPF) rules on every node that intercept packets to 10.96.45.12 and DNAT them to one of the Pod IPs.

Step 8: Health Checks Gate Traffic¶

The Pod is running, but is it ready? Kubernetes has three types of probes:

Probe	Purpose	What happens on failure
Startup	"Has the app finished initializing?"	Keep checking (don't run other probes yet)
Readiness	"Can the app handle requests?"	Remove from Service endpoints (no traffic)
Liveness	"Is the app still alive?"	Restart the container

readinessProbe:
  httpGet:
    path: /health
    port: 8000
  initialDelaySeconds: 5
  periodSeconds: 10

livenessProbe:
  httpGet:
    path: /health
    port: 8000
  initialDelaySeconds: 15
  periodSeconds: 20

Until the readiness probe passes, the Pod exists but receives no traffic. This is how zero-downtime deployments work: new Pods must prove they're healthy before old Pods are removed.

Gotcha: A liveness probe that checks the database ("am I healthy?" → "can I query the DB?") will restart your pod when the database is slow. The pod is fine — it's the database that's struggling. Now you have pod restarts + database load from reconnections. Liveness probes should check "is this process fundamentally stuck?" not "are my dependencies healthy."

The Complete Flow — One Picture¶

[1] kubectl apply -f deployment.yaml
    → YAML → JSON → HTTP POST to API server

[2] API server validates
    → AuthN → AuthZ (RBAC) → Admission webhooks → Schema validation
    → Store Deployment in etcd

[3] Deployment controller (watches Deployments)
    → Creates ReplicaSet

[4] ReplicaSet controller (watches ReplicaSets)
    → Creates 3 Pods (Pending, no node assigned)

[5] Scheduler (watches unassigned Pods)
    → Filter nodes → Score nodes → Assign Pod to node (write spec.nodeName)

[6] Kubelet on assigned node (watches Pods for its node)
    → Pull image → Create sandbox → CNI networking → Mount volumes
    → Run init containers → Run app containers

[7] Kube-proxy (watches Services + Endpoints)
    → Programs iptables/IPVS rules for Service → Pod routing

[8] Readiness probe passes
    → Pod added to Endpoints → Traffic flows

Time from kubectl apply to traffic flowing: typically 30-90 seconds (dominated by image pull and readiness probe initial delay).

Rolling Updates: What Happens When You Change the Image¶

kubectl set image deployment/myapp myapp=registry/myapp:v2

The Deployment controller:

Creates a new ReplicaSet (for v2)
Scales up the new RS by 1 (now: 3 old + 1 new)
Waits for the new Pod's readiness probe to pass
Scales down the old RS by 1 (now: 2 old + 1 new)
Repeats until all replicas are new (0 old + 3 new)

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 1        # How many extra pods during update (1 = 4 total at peak)
    maxUnavailable: 0  # How many can be down (0 = no downtime)

# Watch a rolling update
kubectl rollout status deployment/myapp
# → Waiting for deployment "myapp" rollout to finish: 1 out of 3 new replicas updated...

# Undo if something goes wrong
kubectl rollout undo deployment/myapp

Flashcard Check¶

Q1: Where does the Deployment object live after kubectl apply?

In etcd, accessed through the API server. The API server is the only component that talks to etcd directly.

Q2: What does the scheduler actually do?

It watches for Pods with no nodeName, runs filtering (which nodes can?) and scoring (which is best?), and writes spec.nodeName. It doesn't start anything.

Q3: Readiness probe fails. What happens?

The Pod is removed from Service endpoints. No traffic is routed to it. The Pod keeps running — it's not restarted (that's the liveness probe's job).

Q4: Why does a Deployment create a ReplicaSet instead of Pods directly?

Because rolling updates need two sets of Pods simultaneously (old version + new version). The ReplicaSet is the unit of "this exact version."

Q5: kube-proxy manages the ClusterIP. Is there a process listening on that IP?

No. The ClusterIP is virtual. kube-proxy programs iptables/IPVS rules that intercept packets and DNAT them to real Pod IPs.

Q6: Pod is stuck in Pending. What should you check first?

kubectl describe pod — look at Events for scheduling failures. Common: insufficient CPU/memory, no matching nodes for nodeSelector, untoleratable taints.

Exercises¶

Exercise 1: Watch the chain (hands-on)¶

# In one terminal, watch events
kubectl get events -w

# In another terminal, create a deployment
kubectl create deployment test --image=nginx --replicas=2

# Watch the events and identify each step:
# - Deployment created
# - ReplicaSet created
# - Pods created (Pending)
# - Pods scheduled
# - Image pulled
# - Containers started

# Clean up
kubectl delete deployment test

Exercise 2: Break the scheduler (hands-on)¶

Create a Pod that can't be scheduled:

apiVersion: v1
kind: Pod
metadata:
  name: unschedulable
spec:
  containers:
    - name: app
      image: nginx
      resources:
        requests:
          memory: "999Gi"  # More than any node has

Apply it, check kubectl describe pod unschedulable, see the scheduling failure event. Then delete it.

Exercise 3: The decision (think)¶

A kubectl apply succeeded but the Pod is stuck in various states. What's wrong?

Pod status: Pending for 5 minutes, no events about scheduling
Pod status: ContainerCreating for 3 minutes
Pod status: Running but 0/1 Ready
Pod status: CrashLoopBackOff
Pod status: ImagePullBackOff

Answers

1. **Scheduler can't find a node.** Check `kubectl describe pod` for scheduling events. Usually: insufficient resources, unsatisfiable nodeSelector, or all nodes tainted. 2. **Image pull or volume mount is slow/failing.** Check events for image pull progress. Could also be a volume (PVC) that can't bind, or a CNI plugin issue. 3. **Readiness probe failing.** The container is running but not passing its health check. Check `kubectl logs pod` and `kubectl describe pod` for probe failure events. 4. **App crashes immediately after starting.** Check `kubectl logs pod --previous` for the crash output. Common: missing env var, wrong config path, port conflict. 5. **Can't pull the image.** Wrong image name, auth failure (imagePullSecrets), or registry unreachable. Check `kubectl describe pod` for the pull error details.

Cheat Sheet¶

Debugging Flow¶

Symptom	Command	What to look for
Pending	`kubectl describe pod`	Scheduling events
ContainerCreating	`kubectl describe pod`	Image pull, volume mount
CrashLoopBackOff	`kubectl logs --previous`	App crash output
Running but not Ready	`kubectl describe pod`	Probe failure events
ImagePullBackOff	`kubectl describe pod`	Registry auth, image name

Useful Commands¶

Task	Command
Watch events	`kubectl get events -w --sort-by=.metadata.creationTimestamp`
Rollout status	`kubectl rollout status deployment/NAME`
Rollback	`kubectl rollout undo deployment/NAME`
Force re-schedule	`kubectl delete pod NAME` (controller recreates)
Check scheduler	`kubectl describe pod NAME \\| grep -A5 Events`

Takeaways¶

Everything is a control loop. No component gives orders. They watch etcd (through the API server) and react to changes. Desired state in, actual state converges.
The API server is the only door to etcd. Every kubectl command, every controller, every kubelet goes through the API server. It handles auth, authorization, validation, and admission control.
The scheduler only assigns — it doesn't start. It picks a node and writes spec.nodeName. The kubelet on that node does the actual work.
Readiness probes gate traffic. Until a Pod passes readiness, it exists but receives nothing. This is how zero-downtime deploys work.
Rolling updates use two ReplicaSets. Old and new versions coexist briefly. The Deployment controller orchestrates the gradual swap.

Connection Refused — what goes wrong at the Kubernetes Service layer
Out of Memory — Kubernetes resource limits and the OOM killer
The Hanging Deploy — process lifecycle inside containers