Container Runtime¶

36 cards — 🟢 7 easy | 🟡 10 medium | 🔴 8 hard

🟢 Easy (7)¶

1. What is containerd and how does it relate to Docker?

Show answer

containerd is a high-level container runtime that manages the complete container lifecycle (image pull, storage, execution, networking). Docker uses containerd as its underlying runtime. Kubernetes can use containerd directly via the CRI plugin, bypassing Docker entirely.

Remember: containerd = the industry-standard container runtime. Kubernetes uses it via CRI. Docker uses it under the hood. It manages the full container lifecycle.

2. How does Kubernetes communicate with containerd?

Show answer

Kubernetes uses the Container Runtime Interface (CRI), a gRPC API. containerd exposes a CRI plugin that listens on a Unix socket (default: /run/containerd/containerd.sock). The kubelet calls this socket to create, start, stop, and remove containers.

Remember: containerd = the industry-standard container runtime. Kubernetes uses it via CRI. Docker uses it under the hood. It manages the full container lifecycle.

3. What are Linux namespaces and which ones does a container typically use?

Show answer

Namespaces isolate what a process can see. A typical container uses: pid (process tree), net (network stack), mnt (filesystem mounts), uts (hostname), ipc (inter-process communication), user (UID/GID mapping), and cgroup (cgroup root view). Each namespace gives the container the illusion of its own isolated system.

Remember: Linux namespaces provide isolation: PID (process), NET (network), MNT (filesystem), UTS (hostname), IPC (inter-process), USER (UID mapping). Mnemonic: 'Processes Need Mount points, UTSnames, IPC, and Users.'

4. What are cgroups and how do container runtimes use them?

Show answer

cgroups (control groups) limit and account for resource usage -- CPU, memory, I/O, PIDs. The runtime creates a cgroup for each container and sets limits (e.g., memory.max, cpu.max). When a container exceeds its memory limit, the kernel OOM-kills it. cgroups v2 is the modern unified hierarchy; v1 uses per-resource controllers.

Remember: cgroups = resource limits. Control CPU, memory, disk I/O, and network bandwidth per container. OOM killer triggers when memory cgroup limit is exceeded.

5. What is the OCI specification and what does it define?

Show answer

The Open Container Initiative (OCI) defines three specs: 1) Image Spec -- format for container images (manifest, config, layers). 2) Runtime Spec -- how to run a container (config.json defines root filesystem, mounts, namespaces, cgroups, hooks). 3) Distribution Spec -- API for pushing/pulling images to/from registries. OCI ensures interoperability between runtimes and registries.

Remember: OCI = Open Container Initiative. Defines standards for container images (image-spec) and runtimes (runtime-spec). runc is the reference OCI runtime.

6. What is runc and what role does it play?

Show answer

runc is the reference low-level OCI container runtime. It takes an OCI bundle (config.json + rootfs) and creates a container by setting up namespaces, cgroups, mounts, and seccomp filters, then exec's the container process. containerd and CRI-O call runc (or compatible runtimes like crun, youki) to actually create containers.

Remember: runc = low-level OCI runtime that actually creates containers using Linux namespaces and cgroups. containerd calls runc to start containers.

7. How does container networking work at a basic level?

Show answer

Each container gets its own network namespace with a virtual ethernet pair (veth). One end is in the container namespace, the other is on the host bridge (e.g., docker0, cni0). The bridge connects containers on the same host. Packets leaving the host are NATed via iptables/nftables. In Kubernetes, CNI plugins (Calico, Flannel, Cilium) manage the network plumbing.

Remember: containers are NOT lightweight VMs. They share the host kernel. A kernel exploit in one container can affect all containers on the host. This is why security layers (seccomp, capabilities) matter.

🟡 Medium (10)¶

1. Walk through the image pull lifecycle in containerd.

Show answer

1) Client requests image by reference (e.g., docker.io/library/nginx:1.25). 2) containerd resolves the reference to a registry endpoint. 3) It fetches the manifest (OCI index or manifest list). 4) It downloads each layer blob (content-addressed by sha256 digest). 5) Layers are unpacked into a snapshot (e.g., overlayfs). 6) The image metadata is stored in the content store and indexed.

Remember: containerd = the industry-standard container runtime. Kubernetes uses it via CRI. Docker uses it under the hood. It manages the full container lifecycle.

2. What is the difference between docker exec and docker run at the runtime level?

Show answer

docker run creates a new container (new namespaces, cgroups, rootfs) and starts its entrypoint process. docker exec joins an existing container's namespaces (via setns syscall) and spawns an additional process inside them. exec does not create new cgroups or mount a new rootfs -- it reuses the running container's environment.

Remember: containers are NOT lightweight VMs. They share the host kernel. A kernel exploit in one container can affect all containers on the host. This is why security layers (seccomp, capabilities) matter.

3. How does the container logging model work for stdout and stderr?

Show answer

The container's PID 1 stdout and stderr are captured by the runtime shim via pipes. containerd's shim writes these streams to log files (typically /var/log/containers/ on Kubernetes nodes) in a newline-delimited format with timestamps and stream tags (stdout/stderr). kubectl logs reads these files. If PID 1 is not the app (e.g., a shell wrapper), logs from child processes may not appear unless they inherit the file descriptors.

Remember: containers are NOT lightweight VMs. They share the host kernel. A kernel exploit in one container can affect all containers on the host. This is why security layers (seccomp, capabilities) matter.

4. How do image layers work and what is a union filesystem?

Show answer

An OCI image is a stack of read-only layers, each a tar of filesystem diffs. A union filesystem (e.g., overlayfs) merges these layers into a single coherent view. The container gets a thin read-write layer on top. Writes use copy-on-write: modifying a file from a lower layer copies it up to the writable layer first. Deletes create whiteout files.

Remember: containers are NOT lightweight VMs. They share the host kernel. A kernel exploit in one container can affect all containers on the host. This is why security layers (seccomp, capabilities) matter.

5. You get 'unauthorized: authentication required' when pulling an image. How do you debug?

Show answer

1) Verify credentials: docker login or check imagePullSecrets. 2) Ensure the secret is in the correct namespace. 3) Check if the token has expired (common with ECR tokens that expire every 12 hours). 4) Confirm the image path matches the registry the credentials are for. 5) Check if the registry requires a specific scope or project access (e.g., GCR, ACR). 6) Inspect kubelet logs for detailed error messages.

Remember: containers are NOT lightweight VMs. They share the host kernel. A kernel exploit in one container can affect all containers on the host. This is why security layers (seccomp, capabilities) matter.

6. What does an OCI runtime config.json contain?

Show answer

config.json is the runtime bundle specification. It defines: root (path to rootfs), mounts (bind mounts, tmpfs), process (entrypoint, args, env, cwd, user, capabilities), linux (namespaces, cgroups path, seccomp profile, rlimits, sysctl), and hooks (prestart, poststart, poststop). runc reads this file to create and start the container.

Remember: OCI = Open Container Initiative. Defines standards for container images (image-spec) and runtimes (runtime-spec). runc is the reference OCI runtime.

7. What is a container shim process and why is it needed?

Show answer

The shim (containerd-shim-runc-v2) is an intermediate process between containerd and the container. It serves as the parent of the container's PID 1. It allows containerd to restart without killing running containers. It captures exit codes, manages stdio pipes for logging, and reaps zombie processes. Each container gets its own shim.

Remember: containers are NOT lightweight VMs. They share the host kernel. A kernel exploit in one container can affect all containers on the host. This is why security layers (seccomp, capabilities) matter.

8. A container can reach the internet but not other containers on the same host. What do you check?

Show answer

1) Verify the bridge interface exists and both veth ends are up (ip link). 2) Check iptables FORWARD chain -- Docker or CNI may have DROP rules. 3) Inspect that both containers are on the same bridge/network (docker network inspect or CNI config). 4) Check for network policy enforcement blocking inter-pod traffic. 5) Verify ARP resolution works within the bridge (arping).

Remember: containers are NOT lightweight VMs. They share the host kernel. A kernel exploit in one container can affect all containers on the host. This is why security layers (seccomp, capabilities) matter.

9. What is seccomp and how do container runtimes use it?

Show answer

seccomp (secure computing mode) filters which syscalls a process can make. Container runtimes apply a default seccomp profile that blocks dangerous syscalls (e.g., reboot, mount, kexec_load, bpf). The profile is a JSON allowlist/denylist embedded in the OCI config.json. Kubernetes lets you set custom profiles via securityContext.seccompProfile. Running with Unconfined disables the filter entirely and is a security risk.

Remember: containers are NOT lightweight VMs. They share the host kernel. A kernel exploit in one container can affect all containers on the host. This is why security layers (seccomp, capabilities) matter.

10. What is AppArmor and how does it differ from seccomp in container security?

Show answer

AppArmor is a Linux Security Module that restricts programs based on per-program profiles controlling file access, network access, and capabilities. Seccomp filters syscalls. They are complementary: seccomp limits what syscalls are available, AppArmor limits what resources those syscalls can access (e.g., which paths can be read/written). Container runtimes apply a default AppArmor profile (docker-default or runtime/default) unless overridden.

Remember: containers are NOT lightweight VMs. They share the host kernel. A kernel exploit in one container can affect all containers on the host. This is why security layers (seccomp, capabilities) matter.

🔴 Hard (8)¶

1. A container produces logs but kubectl logs shows nothing. What are common causes?

Show answer

1) The application writes to a file instead of stdout/stderr. 2) The entrypoint is a shell script that backgrounds the real process, breaking FD inheritance. 3) The log file on the node was rotated or truncated. 4) The container runtime's log driver is misconfigured. 5) The container restarted and --previous flag is needed to see prior logs.

Remember: containers are NOT lightweight VMs. They share the host kernel. A kernel exploit in one container can affect all containers on the host. This is why security layers (seccomp, capabilities) matter.

2. A container image is unexpectedly large. How do you diagnose which layers are bloated?

Show answer

Use 'docker history ' or 'crane manifest ' to inspect layer sizes. Tools like dive show each layer's filesystem diff interactively. Common causes: package manager caches not cleaned in the same RUN instruction, large build artifacts copied but not removed, secrets accidentally baked in, or base image is oversized. Each RUN/COPY/ADD creates a layer, so combining commands and using multi-stage builds reduces size.

Remember: containers are NOT lightweight VMs. They share the host kernel. A kernel exploit in one container can affect all containers on the host. This is why security layers (seccomp, capabilities) matter.

3. A pod starts but the app behaves unexpectedly, as if running old code. What image-related issues could cause this?

Show answer

1) The tag (e.g., :latest) was not updated and a cached image is used -- set imagePullPolicy: Always or use digest-pinned references. 2) The image was pushed to the wrong tag. 3) A registry mirror or cache is serving a stale image. 4) The node has a cached layer from a previous pull. 5) Verify with: crictl images | grep and compare the digest to what the registry reports.

Remember: containers are NOT lightweight VMs. They share the host kernel. A kernel exploit in one container can affect all containers on the host. This is why security layers (seccomp, capabilities) matter.

4. You see 'exec format error' when a container starts. What causes this at the runtime level?

Show answer

runc tries to execve the entrypoint binary and the kernel rejects it. Causes: 1) Architecture mismatch -- an amd64 image on an arm64 node (or vice versa) without QEMU/binfmt_misc. 2) The entrypoint is a script without a shebang line (#!/bin/sh). 3) The binary is dynamically linked against libraries missing from the image (not exec format error per se, but similar symptom). 4) Corrupt binary or wrong file format.

Remember: containers are NOT lightweight VMs. They share the host kernel. A kernel exploit in one container can affect all containers on the host. This is why security layers (seccomp, capabilities) matter.

5. You see many zombie (defunct) processes on a container host. What runtime mechanism is failing?

Show answer

The shim process acts as a subreaper for container processes. If the shim is stuck or crashed, it cannot call wait() to reap child exit statuses, causing zombies. Other causes: the container's PID 1 is not handling SIGCHLD (not reaping its own children), or an init process like tini is missing. Check with ps aux | grep defunct and verify shim health with crictl ps.

Remember: containers are NOT lightweight VMs. They share the host kernel. A kernel exploit in one container can affect all containers on the host. This is why security layers (seccomp, capabilities) matter.

6. A containerized app fails with 'operation not permitted' but runs fine outside a container. How do you diagnose a seccomp or capability issue?

Show answer

1) Run with --security-opt seccomp=unconfined to test if seccomp is the cause. 2) Use strace or auditd to identify the blocked syscall. 3) Check dmesg or /var/log/audit/audit.log for seccomp or AppArmor denials (type=SECCOMP or type=AVC). 4) Review the container's capabilities with getpcaps or /proc/1/status CapEff. 5) Add the specific capability (e.g., SYS_PTRACE, NET_RAW) rather than running privileged.

Remember: containers are NOT lightweight VMs. They share the host kernel. A kernel exploit in one container can affect all containers on the host. This is why security layers (seccomp, capabilities) matter.

7. containerd becomes unresponsive on a node. What is your diagnostic process?

Show answer

1) Check containerd service status: systemctl status containerd, journalctl -u containerd for errors.
2) Check disk space -- containerd hangs if the content store or snapshotter partition is full.
3) Check for too many concurrent image pulls exhausting file descriptors (ls /proc/$(pidof containerd)/fd | wc -l).
4) Check for stuck shim processes consuming resources (ps aux | grep shim).
5) SIGQUIT the containerd process to get a goroutine dump for deadlock analysis.
6) Restart containerd -- running containers survive because shims are independent.

8. A container is repeatedly OOMKilled but the application's memory usage looks normal. What could explain this?

Show answer

1) The memory limit is too low -- check cgroup memory.max vs actual RSS+cache usage (cat /sys/fs/cgroup/memory.current). 2) Kernel memory (kmem) accounting is included in cgroups v1 and can push usage over the limit. 3) tmpfs mounts (e.g., /dev/shm, emptyDir medium: Memory) count against the container's memory cgroup. 4) Memory is fragmented and the kernel cannot reclaim pages fast enough. 5) A sidecar container in the same pod is consuming shared memory limits.

Remember: containers are NOT lightweight VMs. They share the host kernel. A kernel exploit in one container can affect all containers on the host. This is why security layers (seccomp, capabilities) matter.