Skip to content

DevOps & Linux Phone Interview Cheat Sheet

One-page-ish quick reference for phone screens. Skim before the call, keep on screen during.


Linux Essentials

Process & Systemd

ps aux | grep <name>              # Find process
ps -eo pid,%cpu,%mem,cmd --sort=-%cpu | head   # Top consumers
kill -15 PID                      # SIGTERM (graceful)
kill -9 PID                       # SIGKILL (last resort)

systemctl status/start/stop/restart <svc>
systemctl enable <svc>            # Start on boot
journalctl -u <svc> -f           # Follow logs
journalctl -p err -b              # Errors since boot

Filesystem & Disk

df -h                             # Disk usage
df -i                             # Inode usage (hidden full-disk cause)
du -sh /var/log/* | sort -rh      # Biggest dirs
lsblk                             # Block devices
mount | column -t                 # Mounts
find / -xdev -type f -size +100M  # Large files

Memory & Performance

free -h                           # Memory overview
vmstat 1 5                        # CPU/mem/swap snapshot
uptime                            # Load averages
iostat -xz 1 3                    # Disk I/O
top -bn1 | head -20               # Quick process view
dmesg -T | tail                   # Kernel messages

USE Method — for each resource check Utilization, Saturation, Errors:

CPU:     uptime  mpstat  pidstat
Memory:  free  vmstat  slabtop
Disk:    iostat  iotop  df -i
Network: sar -n DEV  ss  nstat

Networking

ss -tlnp                          # Listening TCP ports
ip addr show                      # IP addresses
ip route show                     # Routing table
dig example.com +short            # DNS lookup
curl -v telnet://host:port        # Test connectivity
traceroute host                   # Trace path
iptables -L -n -v                 # Firewall rules

File Permissions

rwxrwxrwx = user / group / other
755 = rwxr-xr-x   (dirs, scripts)
644 = rw-r--r--   (config files)
SUID (u+s) — run as file owner
SGID (g+s) — inherit group on dir
Sticky (+t) — only owner can delete

Users & SSH

useradd -m -s /bin/bash user
usermod -aG sudo user
ssh-keygen -t ed25519
ssh -L 8080:localhost:80 user@remote   # Local tunnel
ssh -R 8080:localhost:3000 user@remote # Remote tunnel

Package Management

# Debian/Ubuntu              # RHEL/CentOS
apt update && apt upgrade     dnf update
apt install <pkg>             dnf install <pkg>
dpkg -l | grep <pkg>          rpm -qa | grep <pkg>

Key Log Locations

/var/log/syslog (or /var/log/messages)
/var/log/auth.log
/var/log/kern.log

Git (Quick Hits)

git status
git add -p                        # Stage hunks interactively
git commit -m "message"
git pull --rebase origin main     # Linear history
git log --oneline -10
git stash / git stash pop
git revert HEAD                   # Safe undo on shared branch
git reset --soft HEAD~1           # Undo commit, keep changes

Docker

docker run -d --name app -p 8080:80 nginx:1.25
docker ps / docker ps -a
docker logs -f <container>
docker exec -it <container> bash
docker build -t myapp:v1 .
docker images / docker image prune -a

Dockerfile best practices: multi-stage builds, pin versions, non-root user, COPY not ADD, order layers by change frequency, use .dockerignore.


Kubernetes

Top 5 kubectl commands

kubectl get pods -A -o wide           # What's running
kubectl describe pod <name>           # Events & conditions
kubectl logs <pod> --tail=50 -f       # Stream logs
kubectl exec -it <pod> -- bash        # Shell in
kubectl apply -f manifest.yaml        # Declarative apply

Resource chain

Deployment → ReplicaSet → Pod → Container(s)

Probes

Probe Purpose On Failure
liveness Is the process alive? Container restarted
readiness Can it serve traffic? Removed from Service
startup Has it finished booting? Delays other probes

Service types

Type Scope
ClusterIP Internal only (default)
NodePort Expose on node port 30000-32767
LoadBalancer Cloud LB → NodePort → Pods

Debugging flow

pod not starting?    kubectl describe pod (events)
pod crashing?        kubectl logs --previous
can't reach svc?     kubectl get endpoints (empty = selector mismatch)
slow/unhealthy?      check readiness probe, resource limits

Terraform

terraform init          # Download providers
terraform plan          # Dry run
terraform apply         # Apply changes
terraform destroy       # Tear down (careful!)

Key concepts: state file is source of truth, modules for reuse, plan -out=plan.tfplan for CI safety. HCL: resource, variable, output, data, module, locals.


CI/CD

CI = every commit triggers build + test (catch bugs early). CD = Delivery (every change is deployable) vs Deployment (auto-deployed).

Pipeline stages: lint → test → build → push artifact → deploy staging → deploy prod.

GitHub Actions key syntax:

on: push/pull_request
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - run: make test


Observability (The Three Pillars)

Pillar Tool What
Metrics Prometheus + Grafana Numeric time-series (CPU, latency, error rate)
Logs Loki / ELK / CloudWatch Event records, grep-able
Traces Jaeger / Tempo / Datadog Request path across services

Golden signals (from Google SRE): latency, traffic, errors, saturation.

SLI/SLO/SLA: - SLI = measurement (e.g. 99.2% of requests < 200ms) - SLO = target (e.g. 99.5% of requests < 200ms) - SLA = contract with consequences (e.g. refund if SLO missed)


Ansible (Quick Hits)

ansible-playbook -i inventory playbook.yml
ansible all -m ping                      # Connectivity check
ansible all -a "uptime"                  # Ad-hoc command
Key concepts: idempotent, agentless (SSH), YAML playbooks, roles for reuse, Jinja2 templates, Vault for secrets.


Networking Fundamentals

OSI layers to know: L3 (IP/routing), L4 (TCP/UDP), L7 (HTTP/DNS).

Protocol Port Notes
SSH 22 Encrypted remote access
HTTP/HTTPS 80/443 Web traffic
DNS 53 Name resolution
SMTP 25/587 Email

TCP vs UDP: TCP = reliable, ordered, connection-based. UDP = fast, fire-and-forget (DNS, video, gaming).

DNS resolution: client → resolver cache → recursive resolver → root → TLD → authoritative.

Subnetting shorthand: /24 = 256 IPs, /16 = 65K, /8 = 16M. Private ranges: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16.


Common Interview Questions — Quick Answers

"Walk me through what happens when you type a URL in a browser." DNS lookup → TCP handshake → TLS handshake → HTTP request → server processes → HTTP response → browser renders.

"How do you troubleshoot a slow Linux server?" USE method: uptime (CPU), free (memory), iostat (disk I/O), ss (network). Check dmesg, journalctl for errors. Identify top consumers with top/htop.

"Explain the difference between a container and a VM." VM = full OS + hypervisor, heavy, minutes to boot. Container = shares host kernel, isolated via namespaces + cgroups, lightweight, seconds to start.

"What is Infrastructure as Code?" Manage infra through version-controlled config files instead of manual changes. Benefits: repeatability, audit trail, collaboration, disaster recovery. Tools: Terraform, Ansible, Pulumi, CloudFormation.

"Describe a CI/CD pipeline you've built." Push → lint + unit tests → build Docker image → push to registry → deploy to staging → integration tests → promote to prod (canary or blue-green).