Skip to content

Kubernetes Scheduler

Scope

This document explains the Kubernetes scheduler as a real control-plane component, not just "the thing that picks a node."

It covers:

  • scheduling queue
  • filtering and scoring
  • requests/limits relevance
  • affinity/anti-affinity
  • taints and tolerations
  • preemption
  • plugins/profiles
  • common scheduling failures

Reference anchors: - https://kubernetes.io/docs/concepts/scheduling-eviction/kube-scheduler/ - https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/ - https://kubernetes.io/docs/reference/scheduling/config/


Big Picture

The scheduler's job is:

unscheduled Pod -> pick the most suitable node -> bind Pod to node

That sounds simple. It is not.

The scheduler must account for: - resource requests - node constraints - topology - affinity rules - taints/tolerations - priority - preemption - policy plugins


What the Scheduler Actually Sees

The scheduler does not run your containers. kubelet and container runtime do that on the chosen node.

The scheduler's domain is placement.

Its question is: "Given this pod spec and current cluster state, where is this pod allowed and preferred to run?"


Basic Scheduling Flow

Very roughly:

  1. Pod enters scheduling queue
  2. scheduler watches for it
  3. candidate nodes are filtered
  4. remaining nodes are scored
  5. best node is selected
  6. Pod is bound to node
  7. kubelet on that node takes over runtime execution

This is often described as filter then score.


Filtering Stage

Filtering removes nodes that are not valid.

Reasons a node may be filtered out: - insufficient CPU or memory based on requests - node selector mismatch - affinity rules not satisfied - taint not tolerated - volume constraints - port conflicts / topology constraints - node unschedulable state

This stage answers: "Can it run here at all?"


Scoring Stage

Scoring ranks the nodes that survived filtering.

Scoring can consider: - resource balance - spreading - affinity preferences - image locality - topology preferences - custom plugin behavior

This stage answers: "Among valid nodes, which looks best?"


Requests and Limits

Scheduler decisions are primarily driven by requests, not limits.

That is a critical interview point.

If you set: - request too low -> bin-packing may overcommit reality - request too high -> pods stay pending unnecessarily

The scheduler is making a placement bet using declared demand. Garbage declarations produce garbage placement.


Affinity and Anti-Affinity

Node affinity

Expresses desired or required node labels.

Pod affinity

Place near certain other pods.

Pod anti-affinity

Avoid colocation with certain other pods.

These rules are powerful but can create fragile scheduling if overused.

A cluster can become "policy rich, schedulability poor."


Taints and Tolerations

Taints repel pods. Tolerations allow pods to stay eligible.

Use cases: - dedicated nodes - special hardware - quarantine/failure management - control-plane isolation

This is one of the cleaner ways to express "not everything belongs everywhere."


Preemption and Priority

Higher-priority pods can trigger preemption of lower-priority pods if that is the only path to placement.

That does not mean "scheduler is evil." It means the cluster has policy saying some workloads matter more.

You need to understand the blast radius: preemption can fix one pending pod by disrupting several others.


Scheduling Framework / Plugins

Modern kube-scheduler is plugin-oriented.

Extension points allow behavior in stages such as: - queue sorting - pre-filter - filter - post-filter - score - reserve - permit - bind

This matters because "scheduler behavior" is not one giant monolith anymore. It is structured pipeline logic.


Multiple Profiles / Custom Schedulers

You can configure different scheduling profiles and even run multiple schedulers.

Why this matters: - special workloads - experimental placement policies - custom platform behavior

But most environments are better served by understanding the default scheduler before getting cute.


Common Reasons Pods Stay Pending

  • no nodes satisfy requests
  • affinity impossible
  • taints not tolerated
  • PVC/volume constraints
  • topology spread constraints too strict
  • node selectors wrong
  • cluster simply too small

The scheduler often gets blamed for what is really bad resource declarations or contradictory policy.


Useful Commands

kubectl get pods -A
kubectl describe pod <name>
kubectl get nodes
kubectl describe node <name>

Look for: - events - taints - allocatable resources - request totals - affinity/toleration rules


Interview-Level Things to Explain

You should be able to explain:

  • filter vs score
  • why requests matter more than limits for placement
  • what taints/tolerations do
  • what affinity/anti-affinity do
  • what preemption is
  • why a pod can remain Pending forever without any scheduler bug

Fast Mental Model

The Kubernetes scheduler is a policy engine that filters impossible nodes, scores viable ones, and binds each unscheduled Pod to the best currently acceptable placement based on declared constraints and cluster state.

Wiki Navigation

Prerequisites