Portal | Level: L1: Foundations | Domain: DevOps & Tooling

DevOps Roadmap - Dense Skill Check (Q&A)¶

This is the roadmap (in the order mentioned in the transcript), with 10 bullet questions per topic and short indented answers, roughly easy -> harder within each section.

1) Foundations - Linux fundamentals (10, easy -> harder)¶

What’s a PID?
Unique ID for a running process in the kernel.
Where do config files usually live?
/etc (system-wide); user-specific usually under ~/.config or dotfiles.
Absolute vs relative path?
Absolute starts at /; relative starts from current working directory.
/tmp vs /var/tmp?
/tmp is often cleared on reboot; /var/tmp is intended to persist longer.
Explain stdin/stdout/stderr.
Input stream, normal output, error output (FD 0/1/2).
What does chmod 750 file mean?
Owner rwx, group r-x, others ---.
What is a “mount”?
Attaching a filesystem to a directory in the unified tree.
Hardlink vs symlink?
Hardlink = another name for same inode; symlink = path pointer (can break).
What does systemctl actually control?
systemd units: services, timers, mounts, sockets, targets.
Why align partitions (e.g., 1MiB boundaries)?
Avoid misaligned I/O penalties on 4K/RAID/SSD erase blocks; better performance/endurance.

2) Foundations - Bash (10, easy -> harder)¶

Single quotes vs double quotes?
'...' no expansion; "..." expands variables/escapes.
"$@" vs "$*"?
"$@" preserves args; "$*" joins into one string.
What does set -euo pipefail do?
Exit on error/unset var; fail pipelines if any command fails.
Safely loop over filenames with spaces?
while IFS= read -r -d '' f; do ...; done < <(find ... -print0)
stdout+stderr to file?
cmd >out 2>&1 (or cmd &>out)
stderr only to file?
cmd 2>err
Strip suffix .log from var?
${x%.log}
Default if unset/empty?
${x:-default}
Trap cleanup on exit?
trap 'rm -f "$tmp"' EXIT
Why pipelines hide failures + fix?
Exit code is last command; use pipefail or inspect ${PIPESTATUS[@]}.

3) Foundations - Git (10, easy -> harder)¶

What is a commit (really)?
Snapshot pointer + metadata + parent refs.
What is HEAD?
Pointer to current ref/commit you’re “on”.
What’s in .git/objects/?
Compressed blobs/trees/commits/tags addressed by hash.
Merge vs rebase?
Merge preserves history; rebase rewrites for linear history.
Undo last commit, keep staged?
git reset --soft HEAD~1
Undo last commit, discard changes?
git reset --hard HEAD~1
--soft vs --mixed vs --hard?
Soft: move HEAD; Mixed: unstage; Hard: wipe working tree too.
Rebase conflict workflow?
Fix -> git add -> git rebase --continue (or --abort)
Safest way to rewrite pushed history?
Prefer revert; if forced: git push --force-with-lease.
What does reflog give you?
Local history of ref movement; recovery from “lost” commits.

4) Phase 2 - Cloud basics (10, easy -> harder)¶

What is “cloud” in one line?
On-demand infrastructure APIs (compute/storage/network) with metering.
Compute vs storage vs networking?
VMs/containers; disks/objects; routing/firewalls/IPs/DNS.
IaaS vs PaaS vs SaaS?
Infra building blocks vs managed platforms vs finished apps.
Region vs AZ?
Region = geographic area; AZ = isolated datacenter group in region.
Security group vs NACL (conceptually)?
Instance-level stateful rules vs subnet-level stateless rules.
Object vs block storage?
Objects via HTTP keys; blocks attach as disks with filesystems.
IAM basics: auth vs authz?
Authentication = who; authorization = what allowed (policy).
Shared responsibility model?
Provider secures underlying cloud; you secure configs, data, identity.
High availability basics?
Multi-AZ design + health checks + failover + statelessness.
What actually causes big cloud bills?
Egress, oversized compute, orphaned storage/snapshots, idle managed services.

5) Phase 3 - Infrastructure as Code (10, easy -> harder)¶

What problem does IaC solve?
Repeatable, reviewable, versioned infrastructure changes.
Declarative vs imperative?
Desired state vs step-by-step commands.
What is “state” and why does it matter?
Mapping between code and real resources; drift detection and updates.
Plan vs apply mental model?
Preview changes vs perform changes.
Modules: why use them?
Reuse, standardization, guardrails.
Variables/outputs: why?
Parameterize + connect stacks cleanly.
Drift: what is it?
Reality changed outside IaC; code/state no longer match.
Remote state + locking: why?
Team safety, prevents concurrent corruption.
Secrets handling rule of thumb?
Don’t hardcode; use secret stores/encrypted vars; least exposure.
Idempotency and “immutable” patterns?
Reapply safely; prefer replace-over-mutate for risky changes.

6) Phase 4 - Containers + Kubernetes (10, easy -> harder)¶

What is a container (really)?
Isolated process via namespaces + cgroups, not a VM.
Image vs container?
Image = immutable template; container = running instance.
Why “works on my machine” happens?
Missing deps/env/config; container pins runtime and deps.
What is a volume used for?
Persist data outside container lifecycle.
Port mapping meaning?
Host port forwards to container port.
Kubernetes: Pod vs Deployment?
Pod = unit of running containers; Deployment manages replicas/rollouts.
Service vs Ingress (conceptually)?
Service = stable endpoint; Ingress = HTTP routing layer.
ConfigMap vs Secret?
ConfigMap for non-secret config; Secret for sensitive data (still handle carefully).
Liveness vs readiness probes?
Liveness restarts; readiness controls traffic routing.
Teardown/cleanup basics (K8s)?
Delete what you created: kubectl delete -f <manifests> or kubectl delete deploy,svc,ing,cm,secret -l app=<x>; namespace cleanup: kubectl delete ns <name>.

7) Phase 5 - CI/CD (10, easy -> harder)¶

CI vs CD?
CI validates/builds; CD deploys.
What triggers a pipeline?
Push/PR/tag/schedule/manual.
What is an artifact?
Build output stored for later steps (image/package/binary).
Why run tests in CI?
Catch regressions before merge/deploy.
Why build container images in CI?
Reproducible deployable unit with version tag.
What is “pipeline as code”?
Pipeline definition in repo; reviewed like code.
Deploy strategies: rolling vs blue/green vs canary?
Gradual replace vs parallel cutover vs partial traffic shift.
What is a promotion flow?
Same artifact promoted dev->stage->prod with approvals.
Secrets in CI: main rule?
Inject at runtime via secret store; never echo; restrict scopes.
What makes CI/CD “production-grade”?
Deterministic builds, caching, least-privilege deploy keys, rollback path, audit logs.

8) Final - Observability (10, easy -> harder)¶

Monitoring vs logging vs tracing?
Metrics for trends, logs for events, traces for request flow.
Golden signals?
Latency, traffic, errors, saturation.
What is a time series metric?
Value indexed by time + labels.
Counter vs gauge vs histogram?
Monotonic count vs current value vs distribution.
Why labels can hurt you?
High cardinality explodes storage/query cost.
Alerting goal?
Actionable signals, not noise.
SLI vs SLO?
Indicator measurement vs target objective.
What is “burn rate”?
Speed of consuming error budget; guides urgency.
Dashboards: what’s the trap?
Vanity charts without decision value; no link to alerts/SLOs.
Root cause workflow (tight)?
Correlate alerts -> inspect deploys -> check metrics/logs/traces -> narrow blast radius -> rollback/mitigate -> postmortem.

Additional Resume-Relevant DevOps Sections (Current + In-Demand)¶

These sections do not replace anything above. They extend the skill-check in a practical learning order.

9) Foundations - Networking fundamentals (10, easy -> harder)¶

What’s the difference: IP vs TCP?
IP routes packets; TCP provides reliable ordered streams on top of IP.
What’s a subnet (CIDR) in one line?
A block of IPs defined by prefix length like /24.
What does /24 mean?
24 network bits; 256 addresses total (typically 254 usable in classic IPv4 subnets).
What’s the difference: private vs public IP?
Private is non-routable on the public internet; public is internet-routable.
What’s a default gateway?
The router your host uses for non-local destinations.
DNS: A vs CNAME vs TXT?
A maps name->IPv4; CNAME aliases name->name; TXT stores arbitrary text (often verification/SPF/etc).
What’s NAT and why does it exist?
Translates addresses (often many private -> one public) to conserve IPv4 and simplify networks.
TCP 3-way handshake?
SYN -> SYN/ACK -> ACK to establish a connection.
How do you troubleshoot “can’t reach host” quickly?
Check link/IP/route/DNS: ip a, ip r, ping, traceroute, dig, ss -tulpn.
Explain MTU and a common failure mode.
Max frame size; mismatch can cause PMTU blackholes / weird hangs on large packets.

10) Python for DevOps (automation) (10, easy -> harder)¶

What’s the difference: list vs tuple?
List mutable; tuple immutable.
What’s a dict used for?
Key/value mapping; fast lookup by key.
What is a virtual environment and why use it?
Isolated dependency set per project; avoids system Python conflicts.
How do you read a file safely?
with open(...) as f: ensures close even on exceptions.
Exceptions: try/except/finally purpose?
Handle failures; finally runs cleanup regardless.
What’s the difference: subprocess.run() vs os.system()?
subprocess gives control over args/exit/stdout/stderr safely; os.system is blunt and shell-ish.
JSON/YAML parsing basics?
Use json module; YAML via PyYAML/ruamel; validate shapes after parsing.
Write a CLI in Python (the right way)?
argparse (or click/typer in many shops) + clear subcommands.
Concurrency: threads vs processes (in one line)?
Threads share memory (GIL affects CPU-bound); processes isolate and parallelize CPU-bound work.
Packaging for reuse?
pyproject.toml + pinned deps + entrypoints for CLI; versioning and tests.

11) Ansible (Config Management + Automation) (10, easy -> harder)¶

What is Ansible (one line)?
Agentless automation over SSH using YAML playbooks.
Inventory: what is it?
The list/grouping of hosts (static or dynamic) Ansible targets.
Play vs task vs role?
Play targets hosts; tasks are steps; roles package reusable tasks/vars/templates/handlers.
Idempotency meaning in Ansible?
Re-running yields same end state without repeated changes.
Variables precedence: why it matters?
Same var defined multiple places; precedence controls which wins.
Handlers: what are they for?
Run actions (like restart) only when notified by changed tasks.
Templates vs files?
Templates (Jinja2) render variables; files are copied as-is.
Facts + gather_facts: what does it do?
Collect host info (OS, interfaces, etc.) for conditional logic.
Vault: what problem does it solve?
Encrypt secrets in repo (still manage access/rotation carefully).
Collections + modules: how to stay sane?
Pin collection versions; prefer well-known modules; avoid shell where module exists.

12) Terraform (practical + resume-grade) (10, easy -> harder)¶

What is Terraform (one line)?
Declarative IaC that plans/applies changes to real infrastructure.
Provider vs resource vs data source?
Provider talks to API; resource creates/changes; data reads existing things.
Variables: input vs local vs output?
Inputs parameterize; locals compute; outputs expose values to other layers.
What is state and where should it live?
Resource mapping; store remotely with locking for teams.
plan vs apply vs destroy?
Preview vs execute vs delete managed resources.
Modules: why and how?
Reuse patterns; version modules; pass inputs; expose outputs.
Drift and how to detect it?
Reality != state; plan reveals; avoid out-of-band changes.
Secrets handling in Terraform?
Avoid plaintext outputs/state; use secret managers and sensitive vars; restrict state access.
Workspaces vs separate state files?
Workspaces for simple env splits; separate states often cleaner for prod separation.
Safe change patterns?
Use lifecycle cautiously; create_before_destroy for cutover; small blast radius; peer review.

13) Docker (hands-on + production-minded) (10, easy -> harder)¶

Image vs container?
Image template; container running instance.
What’s in a Dockerfile layer?
Each instruction creates a layer; cache depends on instruction + context.
Why is build context dangerous?
Sending too much to daemon; leaks secrets; slow builds; use .dockerignore.
How do you pass config to containers?
Env vars, mounted files, ConfigMaps/Secrets (in K8s), or runtime flags.
Ports: EXPOSE vs -p?
EXPOSE documents; -p actually publishes.
Volumes: bind mount vs named volume?
Bind uses host path; named managed by Docker (portable-ish).
Multi-stage builds: why?
Smaller runtime images; separate build tooling from runtime.
Rootless / least privilege basics?
Run as non-root; drop caps; read-only FS when possible.
Networking: bridge vs host vs overlay (conceptually)?
Bridge = default NAT; host = no isolation; overlay = multi-host (Swarm/K8s CNIs handle similar).
Debug a broken container fast?
docker logs, docker exec -it, inspect env/volumes, check exit code, run shell in image.

14) Kubernetes (resume-grade essentials) (10, easy -> harder)¶

What is Kubernetes (one line)?
Orchestrates containers: scheduling, scaling, service discovery, rollout.
Pod vs ReplicaSet vs Deployment?
Pod runs; ReplicaSet keeps count; Deployment manages rollout/updates.
Service types: ClusterIP vs NodePort vs LoadBalancer?
Internal, node-exposed, cloud LB-backed.
Namespace: why?
Isolation, quotas, scoping of names and RBAC.
ConfigMap vs Secret?
Non-secret config vs secret material (still treat carefully).
Readiness vs liveness vs startup probes?
Ready for traffic vs restart if dead vs slow-start protection.
Resource requests vs limits?
Requests schedule capacity; limits cap usage (can throttle/kill).
Rolling update vs rollback?
Gradual replace; rollback returns to previous ReplicaSet revision.
RBAC: Role vs ClusterRole?
Namespace-scoped vs cluster-scoped permissions.
Practical teardown patterns?
kubectl delete -f <manifests>; or delete namespace: kubectl delete ns <ns> (nukes everything in it).

15) CI/CD (tooling specifics that show up in job reqs) (10, easy -> harder)¶

What’s a runner/agent?
Worker that executes pipeline jobs (self-hosted or managed).
How do you cache dependencies safely?
Cache keyed by lockfile; avoid caching secrets/build outputs incorrectly.
What’s the difference: artifact vs cache?
Artifacts are outputs to pass along; caches speed builds and are reusable.
What is “environment promotion”?
Same build artifact promoted through dev->stage->prod.
How do you prevent accidental prod deploy?
Protected branches, approvals, manual gates, environment rules.
How do you manage secrets in pipelines?
Secret store/integration; masked variables; least privilege; short-lived tokens.
Why pin tool versions in CI?
Reproducibility; avoids surprise breakage.
GitHub Actions: what are “permissions” and why important?
Token scopes for GITHUB_TOKEN; default too broad sometimes; lock down.
GitLab CI: what are “stages” and “needs”?
Stages order; needs enables DAG and faster pipelines.
Supply-chain controls (baseline)?
Signed artifacts, SBOMs, provenance, scanning, and protected deployment keys.

16) Observability (metrics/logs/traces in production) (10, easy -> harder)¶

What is Prometheus scraping?
Pull model: Prometheus polls /metrics endpoints.
Pushgateway: when use it?
Short-lived jobs that can’t be scraped reliably.
What is a label cardinality blow-up?
Too many unique label values; costs storage/query performance.
Basic PromQL: rate vs irate?
rate smoother over window; irate more spiky/instant-ish.
Alert fatigue: primary cause?
Non-actionable alerts; missing thresholds/SLO framing; no dedupe.
What’s a good alert description include?
Symptom, impact, likely causes, and first 3 debug steps.
Logs: structured vs unstructured?
Structured (JSON) is queryable; unstructured is grep-only pain.
Tracing: what’s a span?
Timed unit of work within a trace (parent/child relationships).
SLOs: why companies care?
Turns reliability into measurable targets; manages tradeoffs via error budgets.
Incident loop (tight)?
Detect -> triage -> mitigate -> root cause -> follow-up actions -> measure improvement.