Mental Model: Bisect¶

Category: Debugging & Diagnosis Origin: Binary search algorithm (classical computer science); applied to version control debugging, most famously formalized as git bisect (Linus Torvalds, Linux kernel development, ~2005) One-liner: Divide the known-good to known-bad range in half, test the midpoint, and repeat — logarithmically narrowing down which change introduced the fault.

The Model¶

Bisect is the application of binary search to the problem of finding the change that introduced a regression. The precondition is simple and powerful: you have a known-good state and a known-bad state, and the transition from good to bad happened somewhere in a sequence of changes between them. Binary search guarantees you can find the breaking change in O(log N) steps instead of O(N) — which means finding one bad commit in a sequence of 1,024 takes at most 10 tests instead of potentially 1,024.

The core operation: take the midpoint of the unknown range, test it, and — depending on whether that state is good or bad — discard half the range. Repeat until you've converged on the exact change. The technique works on any ordered sequence where you can define a binary test (good vs bad): git commit history, deployment artifact versions, configuration file revisions, kernel versions, package changelog entries, or dates in a log timeline.

The critical discipline is defining the test precisely before you start bisecting. The test must be deterministic and faithful: if your test is "does the bug reproduce?" and the bug is intermittent, you will get false results that send you in the wrong direction. Intermittent failures require you to either make the test more deterministic (increase test repetition count, instrument to capture the rare event) or switch to a different debugging approach. The test must also be testing the right thing — a false-positive good result (the bug is there but you missed it) will cause bisect to skip the actual breaking change.

Bisect is not always about code commits. The same mental model applies to: infrastructure configuration changes tracked in an audit log, package versions in a dependency tree (bisect the package version space), kernel parameter tuning (bisect the parameter value space to find where performance degrades), date ranges in logs when you know roughly when a problem started, and time-to-event windows in monitoring dashboards. Any ordered sequence with a binary verdict is bisectable.

The model's boundary condition: bisect requires an ordered sequence and the property that once the state flips from good to bad, it stays bad (monotonicity). If a regression was introduced by change X but partially masked by change Y later in the sequence, bisect will converge on Y, not X. Similarly, if the failure is environmental and not tied to a specific change, bisect will give inconsistent results.

Visual¶

Bisect: O(log N) change isolation

Commit history (newest right, oldest left):
[GOOD]──●──●──●──●──●──●──●──●──●──●──●──●──●──●──[BAD]
  v1.0                                              v1.14

Round 1: Test midpoint v1.7
[GOOD]──●──●──●──●──●──●──[TEST]──●──●──●──●──●──●──[BAD]
                            v1.7

Result: v1.7 is GOOD → discard left half, search right half

Round 2: Test midpoint v1.11
                           [GOOD]──●──●──[TEST]──●──●──[BAD]
                            v1.7    v1.11

Result: v1.11 is BAD → discard right half, search left half

Round 3: Test midpoint v1.9
                           [GOOD]──[TEST]──[BAD]
                            v1.7    v1.9   v1.11

Result: v1.9 is GOOD → discard left half

Round 4: Test midpoint v1.10
                                   [GOOD]──[TEST]──[BAD]
                                    v1.9    v1.10  v1.11

Result: v1.10 is BAD → FOUND: v1.10 introduced the regression

Total tests: 4  (vs up to 14 for linear search)
Range: 14 commits → 4 tests  (log₂(14) ≈ 3.8)

When to Reach for This¶

A regression was introduced somewhere in a known range of git commits, deployments, or configuration changes — bisect is your fastest path to the culprit
"It worked last week, it doesn't work this week" — you have a time boundary; translate it to a commit or version boundary and bisect
Dependency upgrades caused a failure: bisect across the dependency version space to find the exact breaking version
Kernel version regression: bisect across kernel versions (git bisect works on the Linux kernel tree)
Performance degradation: bisect across a deployment history to find when the throughput/latency metric crossed the threshold
CI/CD pipeline failures: if a test suite starts failing and many commits have accumulated, bisect rather than manually reviewing each

When NOT to Use This¶

When the failure is not reproducible with a consistent test: a flaky failure will give bisect incorrect good/bad verdicts, leading to a wrong conclusion
When changes are not independent — if commits have tangled dependencies (commit B requires commit A to compile), you cannot cleanly test B without A; use git bisect with --first-parent or cherry-pick into an isolated branch
When there is no clear ordering to the change space: if you suspect an environmental factor (ambient temperature, load pattern, network topology) rather than a discrete change, bisect doesn't apply
When the regression was introduced deliberately and reverted: bisect will find the revert, not the original introduction; inspect the commit log around the result
When you already strongly suspect a specific change: just test that change directly instead of running bisect's full protocol

Git Bisect Mechanics¶

git bisect automates the binary search over commit history. Understanding the mechanics prevents common errors:

# Start a bisect session
git bisect start

# Mark the current commit as bad (or provide a ref)
git bisect bad HEAD

# Mark a known-good commit (can use a tag, SHA, or relative ref)
git bisect good v1.14.3

# git bisect now checks out the midpoint commit
# Test the current state manually, then report:
git bisect good   # if this commit is good (shrinks to upper half)
git bisect bad    # if this commit is bad (shrinks to lower half)

# Or provide an automated test script — git bisect runs it for you:
git bisect run ./test-script.sh
# Script must exit 0 for good, 1 for bad, 125 to skip (untestable commit)

# When done, reset HEAD back to original position:
git bisect reset

Handling untestable commits: If a midpoint commit doesn't compile or is otherwise untestable, use git bisect skip to skip it. Bisect will find the closest testable commit. If too many commits are skipped, bisect cannot guarantee it found the first bad commit — it will report a range.

Branching bisect: If the breaking change was merged from a feature branch, and the merge commit is the "bad" one (not the individual commits), use git bisect with --first-parent to walk only the mainline commits, ignoring the internal history of merged branches:

git log --first-parent --oneline
git bisect start --first-parent

Parallelizing bisect: For CI-heavy projects where each bisect step takes 10+ minutes, calculate the bisect steps in advance and run them in parallel in separate worktrees or containers. Bisect is normally sequential (each step depends on the previous result), but the first step (midpoint) can be queued immediately. Some teams precompute the full decision tree and run the outer branches in parallel when the history is large and each test is expensive.

Beyond Git — Bisect as a General Mental Model¶

The binary search mental model extends to any problem with an ordered search space and a binary verdict:

Dependency version bisect: A Python package upgrade broke something. You know it worked on version 2.3.1 and fails on 2.8.0. Use pip install 'package==2.5.5', test, repeat. This works for any dependency manager.

Configuration parameter bisect: A tuning parameter (e.g., vm.dirty_ratio) causes performance degradation at some value. Binary search the parameter space: if degradation occurs at 80, try 40; if 40 is fine, try 60; if 60 fails, try 50. Converges in log₂(range) tests.

Log timestamp bisect: You see a corrupt state in a database backup. You have daily backups for the last 30 days. Binary search the backups: restore day 15, check for corruption; if present, try day 7; if absent, try day 11. Find the first corrupted backup in 5 restores instead of 30.

Infrastructure configuration bisect: Terraform state changes accumulated over 8 months. Something changed that broke a compliance check. If you have state snapshots or a VCS-tracked Terraform backend, bisect the state history.

Applied Examples¶

Example 1: Kernel soft lockup regression — Linux host¶

A production Linux host running a data processing workload starts reporting BUG: soft lockup kernel messages after a kernel upgrade from 5.15.89 to 5.15.102. The host crashes under heavy I/O. Bisect applies.

Define the range: good = 5.15.89, bad = 5.15.102. The Linux stable kernel tree contains patches between these two versions.

Define the test: A script that runs the workload for 10 minutes and checks /var/log/kern.log for soft lockup — exits 0 if clean, exits 1 if a lockup is detected.

Run git bisect:

git bisect start
git bisect bad v5.15.102
git bisect good v5.15.89
git bisect run ./test-softlockup.sh

After 4 iterations (log₂(13) ≈ 4), bisect converges on a specific commit in the block I/O scheduler path. The commit message mentions "optimize request merging under high queue depth." This is the regression.

Resolution: The commit is reverted in a custom kernel build, or the production workload is pinned to 5.15.89 until the upstream fix arrives. The bisect result is attached to the upstream bug report.

Example 2: Deployment-induced service latency spike — CI/CD pipeline¶

A microservice's p99 latency climbs from 80ms to 1.2 seconds starting sometime in the past two weeks. 23 deployments have occurred in that window. Linear inspection of 23 changelogs would take hours; bisect takes 5 test runs.

Define the range: current bad deploy (v2.3.23) and last known good (v2.3.0, confirmed from monitoring archives showing normal latency).

Define the test: Deploy the candidate version to a staging environment, run 60 seconds of synthetic load, check if p99 latency exceeds 200ms threshold.

Bisect process (manual): - Test v2.3.11 → good (82ms p99) - Test v2.3.17 → bad (1.1s p99) - Test v2.3.14 → good (79ms p99) - Test v2.3.16 → bad (1.3s p99) - Test v2.3.15 → good (81ms p99) - Conclusion: v2.3.16 introduced the regression

Inspect v2.3.16's diff: a database query was changed to include a JOIN on an unindexed column. The slow query was masked in development by a small dataset. Fix: add the index, redeploy.

Efficiency Calculation: Why Bisect Matters¶

It is worth being concrete about why binary search is worth the discipline overhead, particularly when a team is under incident pressure and "just check the last few commits" feels faster.

Range size (commits)	Linear worst-case	Bisect worst-case	Bisect steps
10	10	4	log₂(10) ≈ 3.3
50	50	6	log₂(50) ≈ 5.6
100	100	7	log₂(100) ≈ 6.6
500	500	9	log₂(500) ≈ 8.9
1,000	1,000	10	log₂(1000) ≈ 10

For a range of 100 commits where each test takes 5 minutes, linear search takes up to 500 minutes (8+ hours). Bisect takes at most 35 minutes. For a range of 10 commits where each test takes 1 minute, linear "wins" if the bad commit happens to be in the first half — but bisect is never worse than 4 minutes vs. 10 minutes.

The practical threshold: if the range has more than 5 commits and the test takes more than 2 minutes per run, bisect is almost always faster. Below that threshold, linear inspection is reasonable.

The Junior vs Senior Gap¶

Junior	Senior
Reviews commit history linearly, reading each diff, hoping to spot the bad change	Immediately calculates the range size and sets up bisect — O(log N) not O(N)
Tests changes in chronological order from oldest to newest	Tests the midpoint first, discarding half the range before doing any further work
Defines a vague test ("does it seem slow?") that produces inconsistent results	Writes a precise, automated, deterministic test before starting bisect
Abandons bisect when a test is inconclusive and reverts to manual review	Recognizes inconclusive results as a signal to fix the test, not to abandon the method
Finds the bad commit but only notes the fix needed	Reads the commit author, commit message, and related context to understand why the regression was introduced — feeds into Five Whys
Applies bisect only to git history	Applies the same mental model to deployment versions, config revisions, package versions, and time windows

Defining a Good Bisect Test¶

The automated test you provide to git bisect run is the most critical element of the process. A poor test produces a wrong answer faster — which is worse than no answer at all. Characteristics of a good bisect test:

Binary and deterministic. The test must produce the same result when run against the same commit in the same environment. Intermittent failures make bisect unusable — if the bug reproduces 30% of the time, bisect will frequently misclassify a "bad" commit as "good," pointing you to the wrong commit. If you cannot make the test deterministic, increase the repetition count (run the test 10 times and classify as bad if >0 failures) or instrument to capture the event more reliably.

Faithful to the original symptom. The test must detect the actual failure, not a proxy. If the original symptom is "p99 latency > 1 second," your test should measure latency under realistic load, not just check whether the service starts. A test that checks "does the service start?" will never find the commit that introduced a slow query path.

Fast enough to make bisect practical. If each test takes 30 minutes, bisecting 100 commits takes 5+ hours (log₂(100) ≈ 7 steps × 30 min). Consider: can you run the test in a lighter environment? Can you reproduce the failure more quickly? Can you reduce the history range by narrowing the good/bad boundary from context?

Exit code discipline. git bisect run interprets: - Exit 0: the current commit is GOOD - Exit 1-127 (except 125): the current commit is BAD - Exit 125: skip this commit (untestable) - Exit 128+: abort bisect (script itself failed)

Never use exit 1 for script errors — bisect will classify the commit as "bad" when the script simply failed to run. Use exit 125 for commits that cannot be compiled or tested, and use explicit error handling in your test script.

Isolation from environment. The test should produce the same result regardless of external state: whether prior tests ran on the same machine, what other processes are running, what data is in the database. If environment bleed between tests is possible, reset state between runs or run each test in a clean container.

Connections¶

Complements: Five Whys (Bisect identifies which change caused the regression; Five Whys then traces why that change had the effect it did and why it passed through to production — use them in sequence)
Complements: Differential Diagnosis (Differential Diagnosis handles the case where the cause is unknown and there is no clear ordered sequence to search; Bisect handles the case where you have a change history and need to find the specific breaking point)
Tensions: Correlation vs Causation (Bisect identifies the commit temporally correlated with the failure; causation must still be confirmed by reading the diff and understanding the mechanism — the bisect result is a strong lead, not a proof)
Topic Packs: git, cicd
Case Studies: kernel-soft-lockup (git bisect across the Linux stable kernel tree isolates the regressing patch in the block I/O scheduler path)