Portal | Level: L1: Foundations | Domain: DevOps & Tooling
DevOps Roadmap - Dense Skill Check (Q&A)¶
This is the roadmap (in the order mentioned in the transcript), with 10 bullet questions per topic and short indented answers, roughly easy -> harder within each section.
1) Foundations - Linux fundamentals (10, easy -> harder)¶
- What’s a PID?
- Unique ID for a running process in the kernel.
- Where do config files usually live?
/etc(system-wide); user-specific usually under~/.configor dotfiles.- Absolute vs relative path?
- Absolute starts at
/; relative starts from current working directory. /tmpvs/var/tmp?/tmpis often cleared on reboot;/var/tmpis intended to persist longer.- Explain
stdin/stdout/stderr. - Input stream, normal output, error output (FD 0/1/2).
- What does
chmod 750 filemean? - Owner
rwx, groupr-x, others---. - What is a “mount”?
- Attaching a filesystem to a directory in the unified tree.
- Hardlink vs symlink?
- Hardlink = another name for same inode; symlink = path pointer (can break).
- What does
systemctlactually control? - systemd units: services, timers, mounts, sockets, targets.
- Why align partitions (e.g., 1MiB boundaries)?
- Avoid misaligned I/O penalties on 4K/RAID/SSD erase blocks; better performance/endurance.
2) Foundations - Bash (10, easy -> harder)¶
- Single quotes vs double quotes?
'...'no expansion;"..."expands variables/escapes."$@"vs"$*"?"$@"preserves args;"$*"joins into one string.- What does
set -euo pipefaildo? - Exit on error/unset var; fail pipelines if any command fails.
- Safely loop over filenames with spaces?
while IFS= read -r -d '' f; do ...; done < <(find ... -print0)- stdout+stderr to file?
cmd >out 2>&1(orcmd &>out)- stderr only to file?
cmd 2>err- Strip suffix
.logfrom var? ${x%.log}- Default if unset/empty?
${x:-default}- Trap cleanup on exit?
trap 'rm -f "$tmp"' EXIT- Why pipelines hide failures + fix?
- Exit code is last command; use
pipefailor inspect${PIPESTATUS[@]}.
3) Foundations - Git (10, easy -> harder)¶
- What is a commit (really)?
- Snapshot pointer + metadata + parent refs.
- What is HEAD?
- Pointer to current ref/commit you’re “on”.
- What’s in
.git/objects/? - Compressed blobs/trees/commits/tags addressed by hash.
- Merge vs rebase?
- Merge preserves history; rebase rewrites for linear history.
- Undo last commit, keep staged?
git reset --soft HEAD~1- Undo last commit, discard changes?
git reset --hard HEAD~1--softvs--mixedvs--hard?- Soft: move HEAD; Mixed: unstage; Hard: wipe working tree too.
- Rebase conflict workflow?
- Fix ->
git add->git rebase --continue(or--abort) - Safest way to rewrite pushed history?
- Prefer revert; if forced:
git push --force-with-lease. - What does
refloggive you? - Local history of ref movement; recovery from “lost” commits.
4) Phase 2 - Cloud basics (10, easy -> harder)¶
- What is “cloud” in one line?
- On-demand infrastructure APIs (compute/storage/network) with metering.
- Compute vs storage vs networking?
- VMs/containers; disks/objects; routing/firewalls/IPs/DNS.
- IaaS vs PaaS vs SaaS?
- Infra building blocks vs managed platforms vs finished apps.
- Region vs AZ?
- Region = geographic area; AZ = isolated datacenter group in region.
- Security group vs NACL (conceptually)?
- Instance-level stateful rules vs subnet-level stateless rules.
- Object vs block storage?
- Objects via HTTP keys; blocks attach as disks with filesystems.
- IAM basics: auth vs authz?
- Authentication = who; authorization = what allowed (policy).
- Shared responsibility model?
- Provider secures underlying cloud; you secure configs, data, identity.
- High availability basics?
- Multi-AZ design + health checks + failover + statelessness.
- What actually causes big cloud bills?
- Egress, oversized compute, orphaned storage/snapshots, idle managed services.
5) Phase 3 - Infrastructure as Code (10, easy -> harder)¶
- What problem does IaC solve?
- Repeatable, reviewable, versioned infrastructure changes.
- Declarative vs imperative?
- Desired state vs step-by-step commands.
- What is “state” and why does it matter?
- Mapping between code and real resources; drift detection and updates.
- Plan vs apply mental model?
- Preview changes vs perform changes.
- Modules: why use them?
- Reuse, standardization, guardrails.
- Variables/outputs: why?
- Parameterize + connect stacks cleanly.
- Drift: what is it?
- Reality changed outside IaC; code/state no longer match.
- Remote state + locking: why?
- Team safety, prevents concurrent corruption.
- Secrets handling rule of thumb?
- Don’t hardcode; use secret stores/encrypted vars; least exposure.
- Idempotency and “immutable” patterns?
- Reapply safely; prefer replace-over-mutate for risky changes.
6) Phase 4 - Containers + Kubernetes (10, easy -> harder)¶
- What is a container (really)?
- Isolated process via namespaces + cgroups, not a VM.
- Image vs container?
- Image = immutable template; container = running instance.
- Why “works on my machine” happens?
- Missing deps/env/config; container pins runtime and deps.
- What is a volume used for?
- Persist data outside container lifecycle.
- Port mapping meaning?
- Host port forwards to container port.
- Kubernetes: Pod vs Deployment?
- Pod = unit of running containers; Deployment manages replicas/rollouts.
- Service vs Ingress (conceptually)?
- Service = stable endpoint; Ingress = HTTP routing layer.
- ConfigMap vs Secret?
- ConfigMap for non-secret config; Secret for sensitive data (still handle carefully).
- Liveness vs readiness probes?
- Liveness restarts; readiness controls traffic routing.
- Teardown/cleanup basics (K8s)?
- Delete what you created:
kubectl delete -f <manifests>orkubectl delete deploy,svc,ing,cm,secret -l app=<x>; namespace cleanup:kubectl delete ns <name>.
7) Phase 5 - CI/CD (10, easy -> harder)¶
- CI vs CD?
- CI validates/builds; CD deploys.
- What triggers a pipeline?
- Push/PR/tag/schedule/manual.
- What is an artifact?
- Build output stored for later steps (image/package/binary).
- Why run tests in CI?
- Catch regressions before merge/deploy.
- Why build container images in CI?
- Reproducible deployable unit with version tag.
- What is “pipeline as code”?
- Pipeline definition in repo; reviewed like code.
- Deploy strategies: rolling vs blue/green vs canary?
- Gradual replace vs parallel cutover vs partial traffic shift.
- What is a promotion flow?
- Same artifact promoted dev->stage->prod with approvals.
- Secrets in CI: main rule?
- Inject at runtime via secret store; never echo; restrict scopes.
- What makes CI/CD “production-grade”?
- Deterministic builds, caching, least-privilege deploy keys, rollback path, audit logs.
8) Final - Observability (10, easy -> harder)¶
- Monitoring vs logging vs tracing?
- Metrics for trends, logs for events, traces for request flow.
- Golden signals?
- Latency, traffic, errors, saturation.
- What is a time series metric?
- Value indexed by time + labels.
- Counter vs gauge vs histogram?
- Monotonic count vs current value vs distribution.
- Why labels can hurt you?
- High cardinality explodes storage/query cost.
- Alerting goal?
- Actionable signals, not noise.
- SLI vs SLO?
- Indicator measurement vs target objective.
- What is “burn rate”?
- Speed of consuming error budget; guides urgency.
- Dashboards: what’s the trap?
- Vanity charts without decision value; no link to alerts/SLOs.
- Root cause workflow (tight)?
- Correlate alerts -> inspect deploys -> check metrics/logs/traces -> narrow blast radius -> rollback/mitigate -> postmortem.
Additional Resume-Relevant DevOps Sections (Current + In-Demand)¶
These sections do not replace anything above. They extend the skill-check in a practical learning order.
9) Foundations - Networking fundamentals (10, easy -> harder)¶
- What’s the difference: IP vs TCP?
- IP routes packets; TCP provides reliable ordered streams on top of IP.
- What’s a subnet (CIDR) in one line?
- A block of IPs defined by prefix length like
/24. - What does
/24mean? - 24 network bits; 256 addresses total (typically 254 usable in classic IPv4 subnets).
- What’s the difference: private vs public IP?
- Private is non-routable on the public internet; public is internet-routable.
- What’s a default gateway?
- The router your host uses for non-local destinations.
- DNS: A vs CNAME vs TXT?
- A maps name->IPv4; CNAME aliases name->name; TXT stores arbitrary text (often verification/SPF/etc).
- What’s NAT and why does it exist?
- Translates addresses (often many private -> one public) to conserve IPv4 and simplify networks.
- TCP 3-way handshake?
- SYN -> SYN/ACK -> ACK to establish a connection.
- How do you troubleshoot “can’t reach host” quickly?
- Check link/IP/route/DNS:
ip a,ip r,ping,traceroute,dig,ss -tulpn. - Explain MTU and a common failure mode.
- Max frame size; mismatch can cause PMTU blackholes / weird hangs on large packets.
10) Python for DevOps (automation) (10, easy -> harder)¶
- What’s the difference: list vs tuple?
- List mutable; tuple immutable.
- What’s a dict used for?
- Key/value mapping; fast lookup by key.
- What is a virtual environment and why use it?
- Isolated dependency set per project; avoids system Python conflicts.
- How do you read a file safely?
with open(...) as f:ensures close even on exceptions.- Exceptions:
try/except/finallypurpose? - Handle failures;
finallyruns cleanup regardless. - What’s the difference:
subprocess.run()vsos.system()? subprocessgives control over args/exit/stdout/stderr safely;os.systemis blunt and shell-ish.- JSON/YAML parsing basics?
- Use
jsonmodule; YAML via PyYAML/ruamel; validate shapes after parsing. - Write a CLI in Python (the right way)?
argparse(orclick/typerin many shops) + clear subcommands.- Concurrency: threads vs processes (in one line)?
- Threads share memory (GIL affects CPU-bound); processes isolate and parallelize CPU-bound work.
- Packaging for reuse?
pyproject.toml+ pinned deps + entrypoints for CLI; versioning and tests.
11) Ansible (Config Management + Automation) (10, easy -> harder)¶
- What is Ansible (one line)?
- Agentless automation over SSH using YAML playbooks.
- Inventory: what is it?
- The list/grouping of hosts (static or dynamic) Ansible targets.
- Play vs task vs role?
- Play targets hosts; tasks are steps; roles package reusable tasks/vars/templates/handlers.
- Idempotency meaning in Ansible?
- Re-running yields same end state without repeated changes.
- Variables precedence: why it matters?
- Same var defined multiple places; precedence controls which wins.
- Handlers: what are they for?
- Run actions (like restart) only when notified by changed tasks.
- Templates vs files?
- Templates (Jinja2) render variables; files are copied as-is.
- Facts +
gather_facts: what does it do? - Collect host info (OS, interfaces, etc.) for conditional logic.
- Vault: what problem does it solve?
- Encrypt secrets in repo (still manage access/rotation carefully).
- Collections + modules: how to stay sane?
- Pin collection versions; prefer well-known modules; avoid shell where module exists.
12) Terraform (practical + resume-grade) (10, easy -> harder)¶
- What is Terraform (one line)?
- Declarative IaC that plans/applies changes to real infrastructure.
- Provider vs resource vs data source?
- Provider talks to API; resource creates/changes; data reads existing things.
- Variables: input vs local vs output?
- Inputs parameterize; locals compute; outputs expose values to other layers.
- What is state and where should it live?
- Resource mapping; store remotely with locking for teams.
planvsapplyvsdestroy?- Preview vs execute vs delete managed resources.
- Modules: why and how?
- Reuse patterns; version modules; pass inputs; expose outputs.
- Drift and how to detect it?
- Reality != state;
planreveals; avoid out-of-band changes. - Secrets handling in Terraform?
- Avoid plaintext outputs/state; use secret managers and sensitive vars; restrict state access.
- Workspaces vs separate state files?
- Workspaces for simple env splits; separate states often cleaner for prod separation.
- Safe change patterns?
- Use
lifecyclecautiously;create_before_destroyfor cutover; small blast radius; peer review.
13) Docker (hands-on + production-minded) (10, easy -> harder)¶
- Image vs container?
- Image template; container running instance.
- What’s in a Dockerfile layer?
- Each instruction creates a layer; cache depends on instruction + context.
- Why is build context dangerous?
- Sending too much to daemon; leaks secrets; slow builds; use
.dockerignore. - How do you pass config to containers?
- Env vars, mounted files, ConfigMaps/Secrets (in K8s), or runtime flags.
- Ports:
EXPOSEvs-p? EXPOSEdocuments;-pactually publishes.- Volumes: bind mount vs named volume?
- Bind uses host path; named managed by Docker (portable-ish).
- Multi-stage builds: why?
- Smaller runtime images; separate build tooling from runtime.
- Rootless / least privilege basics?
- Run as non-root; drop caps; read-only FS when possible.
- Networking: bridge vs host vs overlay (conceptually)?
- Bridge = default NAT; host = no isolation; overlay = multi-host (Swarm/K8s CNIs handle similar).
- Debug a broken container fast?
docker logs,docker exec -it, inspect env/volumes, check exit code, run shell in image.
14) Kubernetes (resume-grade essentials) (10, easy -> harder)¶
- What is Kubernetes (one line)?
- Orchestrates containers: scheduling, scaling, service discovery, rollout.
- Pod vs ReplicaSet vs Deployment?
- Pod runs; ReplicaSet keeps count; Deployment manages rollout/updates.
- Service types: ClusterIP vs NodePort vs LoadBalancer?
- Internal, node-exposed, cloud LB-backed.
- Namespace: why?
- Isolation, quotas, scoping of names and RBAC.
- ConfigMap vs Secret?
- Non-secret config vs secret material (still treat carefully).
- Readiness vs liveness vs startup probes?
- Ready for traffic vs restart if dead vs slow-start protection.
- Resource requests vs limits?
- Requests schedule capacity; limits cap usage (can throttle/kill).
- Rolling update vs rollback?
- Gradual replace; rollback returns to previous ReplicaSet revision.
- RBAC: Role vs ClusterRole?
- Namespace-scoped vs cluster-scoped permissions.
- Practical teardown patterns?
kubectl delete -f <manifests>; or delete namespace:kubectl delete ns <ns>(nukes everything in it).
15) CI/CD (tooling specifics that show up in job reqs) (10, easy -> harder)¶
- What’s a runner/agent?
- Worker that executes pipeline jobs (self-hosted or managed).
- How do you cache dependencies safely?
- Cache keyed by lockfile; avoid caching secrets/build outputs incorrectly.
- What’s the difference: artifact vs cache?
- Artifacts are outputs to pass along; caches speed builds and are reusable.
- What is “environment promotion”?
- Same build artifact promoted through dev->stage->prod.
- How do you prevent accidental prod deploy?
- Protected branches, approvals, manual gates, environment rules.
- How do you manage secrets in pipelines?
- Secret store/integration; masked variables; least privilege; short-lived tokens.
- Why pin tool versions in CI?
- Reproducibility; avoids surprise breakage.
- GitHub Actions: what are “permissions” and why important?
- Token scopes for
GITHUB_TOKEN; default too broad sometimes; lock down. - GitLab CI: what are “stages” and “needs”?
- Stages order;
needsenables DAG and faster pipelines. - Supply-chain controls (baseline)?
- Signed artifacts, SBOMs, provenance, scanning, and protected deployment keys.
16) Observability (metrics/logs/traces in production) (10, easy -> harder)¶
- What is Prometheus scraping?
- Pull model: Prometheus polls
/metricsendpoints. - Pushgateway: when use it?
- Short-lived jobs that can’t be scraped reliably.
- What is a label cardinality blow-up?
- Too many unique label values; costs storage/query performance.
- Basic PromQL: rate vs irate?
ratesmoother over window;iratemore spiky/instant-ish.- Alert fatigue: primary cause?
- Non-actionable alerts; missing thresholds/SLO framing; no dedupe.
- What’s a good alert description include?
- Symptom, impact, likely causes, and first 3 debug steps.
- Logs: structured vs unstructured?
- Structured (JSON) is queryable; unstructured is grep-only pain.
- Tracing: what’s a span?
- Timed unit of work within a trace (parent/child relationships).
- SLOs: why companies care?
- Turns reliability into measurable targets; manages tradeoffs via error budgets.
- Incident loop (tight)?
- Detect -> triage -> mitigate -> root cause -> follow-up actions -> measure improvement.