Skip to content

Portal | Level: L1: Foundations | Domain: DevOps & Tooling

DevOps Roadmap - Dense Skill Check (Q&A)

This is the roadmap (in the order mentioned in the transcript), with 10 bullet questions per topic and short indented answers, roughly easy -> harder within each section.


1) Foundations - Linux fundamentals (10, easy -> harder)

  • What’s a PID?
  • Unique ID for a running process in the kernel.
  • Where do config files usually live?
  • /etc (system-wide); user-specific usually under ~/.config or dotfiles.
  • Absolute vs relative path?
  • Absolute starts at /; relative starts from current working directory.
  • /tmp vs /var/tmp?
  • /tmp is often cleared on reboot; /var/tmp is intended to persist longer.
  • Explain stdin/stdout/stderr.
  • Input stream, normal output, error output (FD 0/1/2).
  • What does chmod 750 file mean?
  • Owner rwx, group r-x, others ---.
  • What is a “mount”?
  • Attaching a filesystem to a directory in the unified tree.
  • Hardlink vs symlink?
  • Hardlink = another name for same inode; symlink = path pointer (can break).
  • What does systemctl actually control?
  • systemd units: services, timers, mounts, sockets, targets.
  • Why align partitions (e.g., 1MiB boundaries)?
  • Avoid misaligned I/O penalties on 4K/RAID/SSD erase blocks; better performance/endurance.

2) Foundations - Bash (10, easy -> harder)

  • Single quotes vs double quotes?
  • '...' no expansion; "..." expands variables/escapes.
  • "$@" vs "$*"?
  • "$@" preserves args; "$*" joins into one string.
  • What does set -euo pipefail do?
  • Exit on error/unset var; fail pipelines if any command fails.
  • Safely loop over filenames with spaces?
  • while IFS= read -r -d '' f; do ...; done < <(find ... -print0)
  • stdout+stderr to file?
  • cmd >out 2>&1 (or cmd &>out)
  • stderr only to file?
  • cmd 2>err
  • Strip suffix .log from var?
  • ${x%.log}
  • Default if unset/empty?
  • ${x:-default}
  • Trap cleanup on exit?
  • trap 'rm -f "$tmp"' EXIT
  • Why pipelines hide failures + fix?
  • Exit code is last command; use pipefail or inspect ${PIPESTATUS[@]}.

3) Foundations - Git (10, easy -> harder)

  • What is a commit (really)?
  • Snapshot pointer + metadata + parent refs.
  • What is HEAD?
  • Pointer to current ref/commit you’re “on”.
  • What’s in .git/objects/?
  • Compressed blobs/trees/commits/tags addressed by hash.
  • Merge vs rebase?
  • Merge preserves history; rebase rewrites for linear history.
  • Undo last commit, keep staged?
  • git reset --soft HEAD~1
  • Undo last commit, discard changes?
  • git reset --hard HEAD~1
  • --soft vs --mixed vs --hard?
  • Soft: move HEAD; Mixed: unstage; Hard: wipe working tree too.
  • Rebase conflict workflow?
  • Fix -> git add -> git rebase --continue (or --abort)
  • Safest way to rewrite pushed history?
  • Prefer revert; if forced: git push --force-with-lease.
  • What does reflog give you?
  • Local history of ref movement; recovery from “lost” commits.

4) Phase 2 - Cloud basics (10, easy -> harder)

  • What is “cloud” in one line?
  • On-demand infrastructure APIs (compute/storage/network) with metering.
  • Compute vs storage vs networking?
  • VMs/containers; disks/objects; routing/firewalls/IPs/DNS.
  • IaaS vs PaaS vs SaaS?
  • Infra building blocks vs managed platforms vs finished apps.
  • Region vs AZ?
  • Region = geographic area; AZ = isolated datacenter group in region.
  • Security group vs NACL (conceptually)?
  • Instance-level stateful rules vs subnet-level stateless rules.
  • Object vs block storage?
  • Objects via HTTP keys; blocks attach as disks with filesystems.
  • IAM basics: auth vs authz?
  • Authentication = who; authorization = what allowed (policy).
  • Shared responsibility model?
  • Provider secures underlying cloud; you secure configs, data, identity.
  • High availability basics?
  • Multi-AZ design + health checks + failover + statelessness.
  • What actually causes big cloud bills?
  • Egress, oversized compute, orphaned storage/snapshots, idle managed services.

5) Phase 3 - Infrastructure as Code (10, easy -> harder)

  • What problem does IaC solve?
  • Repeatable, reviewable, versioned infrastructure changes.
  • Declarative vs imperative?
  • Desired state vs step-by-step commands.
  • What is “state” and why does it matter?
  • Mapping between code and real resources; drift detection and updates.
  • Plan vs apply mental model?
  • Preview changes vs perform changes.
  • Modules: why use them?
  • Reuse, standardization, guardrails.
  • Variables/outputs: why?
  • Parameterize + connect stacks cleanly.
  • Drift: what is it?
  • Reality changed outside IaC; code/state no longer match.
  • Remote state + locking: why?
  • Team safety, prevents concurrent corruption.
  • Secrets handling rule of thumb?
  • Don’t hardcode; use secret stores/encrypted vars; least exposure.
  • Idempotency and “immutable” patterns?
  • Reapply safely; prefer replace-over-mutate for risky changes.

6) Phase 4 - Containers + Kubernetes (10, easy -> harder)

  • What is a container (really)?
  • Isolated process via namespaces + cgroups, not a VM.
  • Image vs container?
  • Image = immutable template; container = running instance.
  • Why “works on my machine” happens?
  • Missing deps/env/config; container pins runtime and deps.
  • What is a volume used for?
  • Persist data outside container lifecycle.
  • Port mapping meaning?
  • Host port forwards to container port.
  • Kubernetes: Pod vs Deployment?
  • Pod = unit of running containers; Deployment manages replicas/rollouts.
  • Service vs Ingress (conceptually)?
  • Service = stable endpoint; Ingress = HTTP routing layer.
  • ConfigMap vs Secret?
  • ConfigMap for non-secret config; Secret for sensitive data (still handle carefully).
  • Liveness vs readiness probes?
  • Liveness restarts; readiness controls traffic routing.
  • Teardown/cleanup basics (K8s)?
  • Delete what you created: kubectl delete -f <manifests> or kubectl delete deploy,svc,ing,cm,secret -l app=<x>; namespace cleanup: kubectl delete ns <name>.

7) Phase 5 - CI/CD (10, easy -> harder)

  • CI vs CD?
  • CI validates/builds; CD deploys.
  • What triggers a pipeline?
  • Push/PR/tag/schedule/manual.
  • What is an artifact?
  • Build output stored for later steps (image/package/binary).
  • Why run tests in CI?
  • Catch regressions before merge/deploy.
  • Why build container images in CI?
  • Reproducible deployable unit with version tag.
  • What is “pipeline as code”?
  • Pipeline definition in repo; reviewed like code.
  • Deploy strategies: rolling vs blue/green vs canary?
  • Gradual replace vs parallel cutover vs partial traffic shift.
  • What is a promotion flow?
  • Same artifact promoted dev->stage->prod with approvals.
  • Secrets in CI: main rule?
  • Inject at runtime via secret store; never echo; restrict scopes.
  • What makes CI/CD “production-grade”?
  • Deterministic builds, caching, least-privilege deploy keys, rollback path, audit logs.

8) Final - Observability (10, easy -> harder)

  • Monitoring vs logging vs tracing?
  • Metrics for trends, logs for events, traces for request flow.
  • Golden signals?
  • Latency, traffic, errors, saturation.
  • What is a time series metric?
  • Value indexed by time + labels.
  • Counter vs gauge vs histogram?
  • Monotonic count vs current value vs distribution.
  • Why labels can hurt you?
  • High cardinality explodes storage/query cost.
  • Alerting goal?
  • Actionable signals, not noise.
  • SLI vs SLO?
  • Indicator measurement vs target objective.
  • What is “burn rate”?
  • Speed of consuming error budget; guides urgency.
  • Dashboards: what’s the trap?
  • Vanity charts without decision value; no link to alerts/SLOs.
  • Root cause workflow (tight)?
  • Correlate alerts -> inspect deploys -> check metrics/logs/traces -> narrow blast radius -> rollback/mitigate -> postmortem.

Additional Resume-Relevant DevOps Sections (Current + In-Demand)

These sections do not replace anything above. They extend the skill-check in a practical learning order.


9) Foundations - Networking fundamentals (10, easy -> harder)

  • What’s the difference: IP vs TCP?
  • IP routes packets; TCP provides reliable ordered streams on top of IP.
  • What’s a subnet (CIDR) in one line?
  • A block of IPs defined by prefix length like /24.
  • What does /24 mean?
  • 24 network bits; 256 addresses total (typically 254 usable in classic IPv4 subnets).
  • What’s the difference: private vs public IP?
  • Private is non-routable on the public internet; public is internet-routable.
  • What’s a default gateway?
  • The router your host uses for non-local destinations.
  • DNS: A vs CNAME vs TXT?
  • A maps name->IPv4; CNAME aliases name->name; TXT stores arbitrary text (often verification/SPF/etc).
  • What’s NAT and why does it exist?
  • Translates addresses (often many private -> one public) to conserve IPv4 and simplify networks.
  • TCP 3-way handshake?
  • SYN -> SYN/ACK -> ACK to establish a connection.
  • How do you troubleshoot “can’t reach host” quickly?
  • Check link/IP/route/DNS: ip a, ip r, ping, traceroute, dig, ss -tulpn.
  • Explain MTU and a common failure mode.
  • Max frame size; mismatch can cause PMTU blackholes / weird hangs on large packets.

10) Python for DevOps (automation) (10, easy -> harder)

  • What’s the difference: list vs tuple?
  • List mutable; tuple immutable.
  • What’s a dict used for?
  • Key/value mapping; fast lookup by key.
  • What is a virtual environment and why use it?
  • Isolated dependency set per project; avoids system Python conflicts.
  • How do you read a file safely?
  • with open(...) as f: ensures close even on exceptions.
  • Exceptions: try/except/finally purpose?
  • Handle failures; finally runs cleanup regardless.
  • What’s the difference: subprocess.run() vs os.system()?
  • subprocess gives control over args/exit/stdout/stderr safely; os.system is blunt and shell-ish.
  • JSON/YAML parsing basics?
  • Use json module; YAML via PyYAML/ruamel; validate shapes after parsing.
  • Write a CLI in Python (the right way)?
  • argparse (or click/typer in many shops) + clear subcommands.
  • Concurrency: threads vs processes (in one line)?
  • Threads share memory (GIL affects CPU-bound); processes isolate and parallelize CPU-bound work.
  • Packaging for reuse?
  • pyproject.toml + pinned deps + entrypoints for CLI; versioning and tests.

11) Ansible (Config Management + Automation) (10, easy -> harder)

  • What is Ansible (one line)?
  • Agentless automation over SSH using YAML playbooks.
  • Inventory: what is it?
  • The list/grouping of hosts (static or dynamic) Ansible targets.
  • Play vs task vs role?
  • Play targets hosts; tasks are steps; roles package reusable tasks/vars/templates/handlers.
  • Idempotency meaning in Ansible?
  • Re-running yields same end state without repeated changes.
  • Variables precedence: why it matters?
  • Same var defined multiple places; precedence controls which wins.
  • Handlers: what are they for?
  • Run actions (like restart) only when notified by changed tasks.
  • Templates vs files?
  • Templates (Jinja2) render variables; files are copied as-is.
  • Facts + gather_facts: what does it do?
  • Collect host info (OS, interfaces, etc.) for conditional logic.
  • Vault: what problem does it solve?
  • Encrypt secrets in repo (still manage access/rotation carefully).
  • Collections + modules: how to stay sane?
  • Pin collection versions; prefer well-known modules; avoid shell where module exists.

12) Terraform (practical + resume-grade) (10, easy -> harder)

  • What is Terraform (one line)?
  • Declarative IaC that plans/applies changes to real infrastructure.
  • Provider vs resource vs data source?
  • Provider talks to API; resource creates/changes; data reads existing things.
  • Variables: input vs local vs output?
  • Inputs parameterize; locals compute; outputs expose values to other layers.
  • What is state and where should it live?
  • Resource mapping; store remotely with locking for teams.
  • plan vs apply vs destroy?
  • Preview vs execute vs delete managed resources.
  • Modules: why and how?
  • Reuse patterns; version modules; pass inputs; expose outputs.
  • Drift and how to detect it?
  • Reality != state; plan reveals; avoid out-of-band changes.
  • Secrets handling in Terraform?
  • Avoid plaintext outputs/state; use secret managers and sensitive vars; restrict state access.
  • Workspaces vs separate state files?
  • Workspaces for simple env splits; separate states often cleaner for prod separation.
  • Safe change patterns?
  • Use lifecycle cautiously; create_before_destroy for cutover; small blast radius; peer review.

13) Docker (hands-on + production-minded) (10, easy -> harder)

  • Image vs container?
  • Image template; container running instance.
  • What’s in a Dockerfile layer?
  • Each instruction creates a layer; cache depends on instruction + context.
  • Why is build context dangerous?
  • Sending too much to daemon; leaks secrets; slow builds; use .dockerignore.
  • How do you pass config to containers?
  • Env vars, mounted files, ConfigMaps/Secrets (in K8s), or runtime flags.
  • Ports: EXPOSE vs -p?
  • EXPOSE documents; -p actually publishes.
  • Volumes: bind mount vs named volume?
  • Bind uses host path; named managed by Docker (portable-ish).
  • Multi-stage builds: why?
  • Smaller runtime images; separate build tooling from runtime.
  • Rootless / least privilege basics?
  • Run as non-root; drop caps; read-only FS when possible.
  • Networking: bridge vs host vs overlay (conceptually)?
  • Bridge = default NAT; host = no isolation; overlay = multi-host (Swarm/K8s CNIs handle similar).
  • Debug a broken container fast?
  • docker logs, docker exec -it, inspect env/volumes, check exit code, run shell in image.

14) Kubernetes (resume-grade essentials) (10, easy -> harder)

  • What is Kubernetes (one line)?
  • Orchestrates containers: scheduling, scaling, service discovery, rollout.
  • Pod vs ReplicaSet vs Deployment?
  • Pod runs; ReplicaSet keeps count; Deployment manages rollout/updates.
  • Service types: ClusterIP vs NodePort vs LoadBalancer?
  • Internal, node-exposed, cloud LB-backed.
  • Namespace: why?
  • Isolation, quotas, scoping of names and RBAC.
  • ConfigMap vs Secret?
  • Non-secret config vs secret material (still treat carefully).
  • Readiness vs liveness vs startup probes?
  • Ready for traffic vs restart if dead vs slow-start protection.
  • Resource requests vs limits?
  • Requests schedule capacity; limits cap usage (can throttle/kill).
  • Rolling update vs rollback?
  • Gradual replace; rollback returns to previous ReplicaSet revision.
  • RBAC: Role vs ClusterRole?
  • Namespace-scoped vs cluster-scoped permissions.
  • Practical teardown patterns?
  • kubectl delete -f <manifests>; or delete namespace: kubectl delete ns <ns> (nukes everything in it).

15) CI/CD (tooling specifics that show up in job reqs) (10, easy -> harder)

  • What’s a runner/agent?
  • Worker that executes pipeline jobs (self-hosted or managed).
  • How do you cache dependencies safely?
  • Cache keyed by lockfile; avoid caching secrets/build outputs incorrectly.
  • What’s the difference: artifact vs cache?
  • Artifacts are outputs to pass along; caches speed builds and are reusable.
  • What is “environment promotion”?
  • Same build artifact promoted through dev->stage->prod.
  • How do you prevent accidental prod deploy?
  • Protected branches, approvals, manual gates, environment rules.
  • How do you manage secrets in pipelines?
  • Secret store/integration; masked variables; least privilege; short-lived tokens.
  • Why pin tool versions in CI?
  • Reproducibility; avoids surprise breakage.
  • GitHub Actions: what are “permissions” and why important?
  • Token scopes for GITHUB_TOKEN; default too broad sometimes; lock down.
  • GitLab CI: what are “stages” and “needs”?
  • Stages order; needs enables DAG and faster pipelines.
  • Supply-chain controls (baseline)?
  • Signed artifacts, SBOMs, provenance, scanning, and protected deployment keys.

16) Observability (metrics/logs/traces in production) (10, easy -> harder)

  • What is Prometheus scraping?
  • Pull model: Prometheus polls /metrics endpoints.
  • Pushgateway: when use it?
  • Short-lived jobs that can’t be scraped reliably.
  • What is a label cardinality blow-up?
  • Too many unique label values; costs storage/query performance.
  • Basic PromQL: rate vs irate?
  • rate smoother over window; irate more spiky/instant-ish.
  • Alert fatigue: primary cause?
  • Non-actionable alerts; missing thresholds/SLO framing; no dedupe.
  • What’s a good alert description include?
  • Symptom, impact, likely causes, and first 3 debug steps.
  • Logs: structured vs unstructured?
  • Structured (JSON) is queryable; unstructured is grep-only pain.
  • Tracing: what’s a span?
  • Timed unit of work within a trace (parent/child relationships).
  • SLOs: why companies care?
  • Turns reliability into measurable targets; manages tradeoffs via error budgets.
  • Incident loop (tight)?
  • Detect -> triage -> mitigate -> root cause -> follow-up actions -> measure improvement.