linux
l3
deep-dive
linux-fundamentals --- Portal | Level: L3: Advanced | Topics: Linux Fundamentals | Domain: Linux

Linux Process Scheduler¶

Scope¶

This document explains Linux scheduling from the viewpoint of operations, performance, and interview readiness.

It covers:

runnable tasks and run queues
scheduler classes
CFS/EEVDF-era intuition
fairness
priorities and nice values
realtime scheduling
CPU affinity
load balancing
cgroup CPU control
scheduling pathologies

Reference anchors: - https://docs.kernel.org/scheduler/index.html - https://docs.kernel.org/scheduler/sched-design-CFS.html - https://docs.kernel.org/admin-guide/cgroup-v2.html

Big Picture¶

The scheduler answers one relentless question:

Which runnable task should run on which CPU right now?

That is it.

Everything else is policy: - fairness - latency - throughput - realtime guarantees - power efficiency - CPU locality - cgroup bandwidth

Key Distinction: Runnable vs Running vs Sleeping¶

A process can exist without consuming CPU.

Important states conceptually:

running - currently executing on a CPU
runnable - ready to run, waiting for CPU
sleeping/blocking - waiting on IO, lock, timer, event, etc.

High load average often means many tasks are runnable or uninterruptibly blocked, not necessarily that CPUs are 100% busy with useful work.

Run Queues¶

Each CPU has scheduler state and runnable entities queued for selection.

That matters because a multicore system is not one giant single-file line. The kernel tries to place and balance tasks sensibly across CPUs.

Bad outcomes include: - one CPU hot, others cool - cache locality losses - migration overhead - NUMA pain - tail latency spikes

Scheduler Classes¶

Linux has multiple scheduling classes.

At a high level: - normal/fair scheduling - realtime scheduling - deadline scheduling

For most admins, normal/fair and realtime are the big ones to understand.

Fair Scheduling Intuition¶

The fair scheduler is trying to share CPU over time in a way that approximates fairness while preserving responsiveness.

Conceptual model: - every runnable task "deserves" CPU time - tasks that have had less CPU recently should get preference - interactive behavior and weighting matter - the system is trying to avoid both starvation and awful latency

Nice values influence weights, not direct percentages in the naive sense.

Nice Values¶

nice changes relative priority within the normal scheduler class.

Important point: nice is not a magic "take exactly X% CPU" knob. It changes weight relative to competitors.

So: - low nice value -> stronger claim on CPU - high nice value -> weaker claim

This only matters when there is contention.

Realtime Scheduling¶

Linux also supports realtime policies such as FIFO and round-robin styles.

These are not toys. Misuse can starve ordinary work badly.

Realtime scheduling is appropriate when: - deterministic response is more important than fairness - workloads are designed carefully - priority inversion and starvation risks are understood

A box full of badly designed RT tasks can become a very expensive brick.

Deadline Scheduling¶

There is also deadline-oriented scheduling machinery for workloads needing temporal guarantees.

For many ops interviews it is enough to know: - it exists - it is not the same as normal fair scheduling - it targets explicit timing constraints rather than generic fairness

CPU Affinity and Placement¶

Tools like taskset, cpusets, and orchestrator policies can constrain where tasks run.

Reasons: - cache locality - NUMA locality - licensing weirdness - isolating noisy workloads - dedicating cores for latency-sensitive work

But manual pinning has costs: - imbalance - underutilization - operational complexity

Scheduler Load Balancing¶

The kernel periodically balances work across CPUs and scheduling domains.

It must trade off: - locality - fairness - migration cost - power model - asymmetric CPU capacity on some systems

This is why "just move the task to another core" is conceptually simple and operationally messy.

Cgroups and CPU Control¶

Cgroup v2 makes CPU control workload-aware.

Concepts include: - weighted CPU distribution - quotas/limits - hierarchical control

This matters for: - containers - systemd slices - noisy-neighbor control - service-level fairness

Again: Linux scheduling is now "which task runs next?" plus "what policy domain does this workload belong to?"

Common Scheduler Pathologies¶

CPU saturation¶

Too many runnable tasks, not enough cores.

Run-queue latency¶

Tasks are runnable but wait too long to get CPU.

Lock contention¶

Looks like CPU trouble but is really threads fighting over shared resources.

Interrupt/softirq pressure¶

CPU time consumed by networking/storage/kernel work, not just user processes.

Bad cgroup quota settings¶

Artificial throttling that looks like mysterious slowness.

Affinity mistakes¶

Pinned workloads bottleneck one part of the machine.

Useful Commands¶

uptime
top
htop
ps -eo pid,comm,ni,pri,cls,psr,%cpu --sort=-%cpu | head

vmstat 1
pidstat -u -t 1
mpstat -P ALL 1

taskset -cp <pid>
chrt -p <pid>
cat /proc/sched_debug

For deeper work: - perf sched - ftrace - eBPF sched tracing

How to Think About "High CPU"¶

Ask: 1. who is burning CPU? 2. user, system, irq, softirq, or steal? 3. are tasks runnable or blocked? 4. is there contention or throttling? 5. is scheduler behavior the cause or just the messenger?

A lot of "scheduler problems" are actually: - lock contention - bad query plans - garbage collection - interrupt storms - cgroup throttling - virtualization steal time

Interview-Level Things to Explain¶

You should be able to explain:

runnable vs running vs sleeping
what a run queue is
what nice values actually do
difference between fair and realtime scheduling
why affinity exists
how cgroups affect CPU scheduling
why high load average is not the same as "CPU is 100%"

Fast Mental Model¶

The scheduler is the kernel's traffic cop for CPU time: it selects which runnable task gets a core, under policies shaped by fairness, latency, realtime rules, and cgroup isolation.

Prerequisites¶

Linux Ops (Topic Pack, L0)

/proc Filesystem (Topic Pack, L2) — Linux Fundamentals
Advanced Bash for Ops (Topic Pack, L1) — Linux Fundamentals
Adversarial Interview Gauntlet (30 sequences) (Scenario, L2) — Linux Fundamentals
Bash Exercises (Quest Ladder) (CLI) (Exercise Set, L0) — Linux Fundamentals
Case Study: CI Pipeline Fails — Docker Layer Cache Corruption (Case Study, L2) — Linux Fundamentals
Case Study: Container Vuln Scanner False Positive Blocks Deploy (Case Study, L2) — Linux Fundamentals
Case Study: Disk Full Root Services Down (Case Study, L1) — Linux Fundamentals
Case Study: Disk Full — Runaway Logs, Fix Is Loki Retention (Case Study, L2) — Linux Fundamentals
Case Study: HPA Flapping — Metrics Server Clock Skew, Fix Is NTP (Case Study, L2) — Linux Fundamentals
Case Study: Inode Exhaustion (Case Study, L1) — Linux Fundamentals

Linux Process Scheduler¶

Scope¶

Big Picture¶

Key Distinction: Runnable vs Running vs Sleeping¶

Run Queues¶

Scheduler Classes¶

Fair Scheduling Intuition¶

Nice Values¶

Realtime Scheduling¶

Deadline Scheduling¶

CPU Affinity and Placement¶

Scheduler Load Balancing¶

Cgroups and CPU Control¶

Common Scheduler Pathologies¶

CPU saturation¶

Run-queue latency¶

Lock contention¶

Interrupt/softirq pressure¶

Bad cgroup quota settings¶

Affinity mistakes¶

Useful Commands¶

How to Think About "High CPU"¶

Interview-Level Things to Explain¶

Fast Mental Model¶

Wiki Navigation¶

Prerequisites¶

Pages that link here¶

Linux Process Scheduler¶

Scope¶

Big Picture¶

Key Distinction: Runnable vs Running vs Sleeping¶

Run Queues¶

Scheduler Classes¶

Fair Scheduling Intuition¶

Nice Values¶

Realtime Scheduling¶

Deadline Scheduling¶

CPU Affinity and Placement¶

Scheduler Load Balancing¶

Cgroups and CPU Control¶

Common Scheduler Pathologies¶

CPU saturation¶

Run-queue latency¶

Lock contention¶

Interrupt/softirq pressure¶

Bad cgroup quota settings¶

Affinity mistakes¶

Useful Commands¶

How to Think About "High CPU"¶

Interview-Level Things to Explain¶

Fast Mental Model¶

Wiki Navigation¶

Prerequisites¶

Related Content¶

Pages that link here¶