Skip to content

Git Advanced - Primer

Why This Matters

Most engineers know git add, git commit, git push. That gets you through 80% of daily work. The other 20% — recovering from a bad rebase, finding which commit introduced a bug, understanding why a merge went sideways, cleaning sensitive data from history — requires understanding how Git actually works under the hood. This primer covers the internals and advanced operations that separate someone who uses Git from someone who understands it.

The Object Model

Everything in Git is stored as objects in .git/objects/. There are four types:

Object What It Stores
blob File contents (no filename, no metadata — just bytes)
tree Directory listing: maps filenames to blob SHAs and subtree SHAs
commit Points to a tree, has parent commit(s), author, committer, message
tag Points to a commit with a name, tagger, and optional GPG signature
# See what type an object is
git cat-file -t HEAD

# See a commit object
git cat-file -p HEAD
# tree 4b825dc642cb6eb9a060e54bf899d5b2f1230b47
# parent 8a3b5e7...
# author Jane Doe <jane@example.com> 1710000000 -0500
# committer Jane Doe <jane@example.com> 1710000000 -0500
#
# fix: handle nil pointer in auth middleware

# See the tree that commit points to
git cat-file -p HEAD^{tree}
# 100644 blob a1b2c3d4...    README.md
# 040000 tree e5f6a7b8...    src

# See a blob (raw file contents)
git cat-file -p a1b2c3d4

How Refs Work

Refs are human-readable names that point to commit SHAs:

.git/refs/heads/main           SHA of latest commit on main
.git/refs/heads/feature/auth   SHA of latest commit on feature/auth
.git/refs/tags/v1.2.0          SHA of tag object (or commit directly)
.git/refs/remotes/origin/main  SHA of last known commit on remote main
.git/HEAD                      ref: refs/heads/main (current branch)
# What does HEAD point to right now?
git rev-parse HEAD
# a1b2c3d4e5f6...

# What branch is HEAD on?
git symbolic-ref HEAD
# refs/heads/main

# What does a tag resolve to?
git rev-parse v1.2.0

Understanding refs is critical for recovery. When you "lose" a commit, the object still exists — you just lost the ref pointing to it.

Rebase vs Merge

Merge

Creates a merge commit with two parents. Preserves the full branch topology.

git checkout main
git merge feature/auth
# Creates a merge commit if there are divergent changes

Resulting history:

* M  merge feature/auth into main
|\
| * C3 feature: add token refresh
| * C2 feature: add auth middleware
|/
* C1 initial setup

Rebase

Replays your commits on top of the target branch. Produces a linear history.

git checkout feature/auth
git rebase main
# Replays C2, C3 on top of main's HEAD

Resulting history:

* C3' feature: add token refresh
* C2' feature: add auth middleware
* C1  initial setup (main)

The commits get new SHAs (C2', C3') because their parent changed. This is why rebasing published commits causes problems.

Interactive Rebase

The power tool for cleaning up history before merging:

git rebase -i HEAD~5

Opens an editor with your last 5 commits:

pick a1b2c3d fix: typo in README
pick d4e5f6a feat: add user model
pick 7a8b9c0 wip: debugging auth
pick 1d2e3f4 fix: auth actually works now
pick 5a6b7c8 feat: add user endpoints

Commands you can use: - pick — keep the commit as-is - reword — keep the commit, edit the message - squash — meld into previous commit, combine messages - fixup — meld into previous commit, discard this message - drop — delete the commit entirely - edit — pause rebase at this commit for amending

Common workflow — squash WIP commits before merge:

pick d4e5f6a feat: add user model
fixup 7a8b9c0 wip: debugging auth
fixup 1d2e3f4 fix: auth actually works now
pick 5a6b7c8 feat: add user endpoints

Rebase Strategies

# Rebase and auto-squash fixup! commits
git rebase -i --autosquash main

# Create a fixup commit (will auto-squash later)
git commit --fixup=<sha>

# Rebase, preferring our changes on conflict
git rebase -X theirs main

# Rebase, preferring their changes on conflict
git rebase -X ours main

Cherry-Pick

Apply a specific commit from another branch:

# Apply commit abc123 to current branch
git cherry-pick abc123

# Cherry-pick without committing (stage changes only)
git cherry-pick --no-commit abc123

# Cherry-pick a range of commits
git cherry-pick abc123..def456

# If there's a conflict during cherry-pick
git cherry-pick --continue   # after resolving
git cherry-pick --abort       # give up

Real use case: a hotfix was committed to a release branch and needs to go to main too.

Reflog — Your Safety Net

The reflog records every time HEAD moves. It is your primary recovery tool.

# Show reflog (every HEAD movement)
git reflog
# a1b2c3d HEAD@{0}: commit: feat: add caching layer
# d4e5f6a HEAD@{1}: rebase (finish): returning to refs/heads/main
# 7a8b9c0 HEAD@{2}: rebase (pick): feat: add user model
# 1d2e3f4 HEAD@{3}: rebase (start): checkout origin/main
# 5a6b7c8 HEAD@{4}: commit: wip: still debugging

# Recover a commit lost during rebase
git checkout 5a6b7c8
# or
git branch recovery-branch 5a6b7c8

# Show reflog for a specific branch
git reflog show feature/auth

# Reflog entries expire after 90 days (default)
# Unreachable entries expire after 30 days

The reflog is local only. It is never pushed. It is your personal undo history.

Bisect — Automated Bug Hunting

Binary search through commits to find which one introduced a bug:

# Start bisecting
git bisect start

# Mark the current commit as bad
git bisect bad

# Mark a known good commit
git bisect good v1.4.0

# Git checks out a commit halfway between good and bad
# Test it, then mark:
git bisect good   # if this commit is fine
git bisect bad    # if this commit has the bug

# Git narrows the range. Repeat until it finds the first bad commit.
# Bisecting: 3 revisions left to test after this (roughly 2 steps)

# When done
git bisect reset

Automated bisect with a test script:

# Git runs the script at each step. Exit 0 = good, exit 1 = bad.
git bisect start HEAD v1.4.0
git bisect run ./test-for-bug.sh
# Git reports the first bad commit automatically

Example test script:

#!/bin/bash
# test-for-bug.sh
make build && ./run-tests.sh --filter "test_user_login"

This is invaluable for regressions in large codebases. 1000 commits between good and bad? Bisect finds the answer in ~10 steps.

Stash

Temporarily shelve changes without committing:

# Stash current changes (tracked files only)
git stash

# Stash with a descriptive message
git stash push -m "halfway through auth refactor"

# Stash including untracked files
git stash push -u -m "include new test files"

# Stash only specific files
git stash push -m "just the config" -- config.yaml src/config.py

# List stashes
git stash list
# stash@{0}: On main: halfway through auth refactor
# stash@{1}: On main: WIP debugging

# Apply most recent stash (keep it in stash list)
git stash apply

# Apply and remove from stash list
git stash pop

# Apply a specific stash
git stash apply stash@{1}

# Show what a stash contains
git stash show -p stash@{0}

# Create a branch from a stash
git stash branch new-feature stash@{0}

# Drop a specific stash
git stash drop stash@{1}

# Drop all stashes
git stash clear

Worktrees

Multiple working directories from a single repository:

# Create a worktree for a different branch
git worktree add ../project-hotfix hotfix/urgent-fix

# List active worktrees
git worktree list
# /home/dev/project        a1b2c3d [main]
# /home/dev/project-hotfix d4e5f6a [hotfix/urgent-fix]

# Create a worktree with a new branch
git worktree add -b feature/new-api ../project-api

# Remove a worktree
git worktree remove ../project-hotfix

# Prune stale worktree metadata
git worktree prune

Use case: you are deep in a feature branch, a P1 bug comes in, and you need to context-switch without stashing. Create a worktree, fix the bug, push, delete the worktree, continue your feature work.

Submodules vs Subtrees

Submodules

A pointer to a specific commit in another repository:

# Add a submodule
git submodule add https://github.com/org/shared-lib.git lib/shared

# Clone a repo with submodules
git clone --recurse-submodules https://github.com/org/main-repo.git

# Update submodules to latest remote commit
git submodule update --remote

# Initialize submodules after a regular clone
git submodule init
git submodule update

Submodules are widely used but have sharp edges: detached HEAD by default, easy to commit stale pointers, contributors must remember --recurse-submodules.

Subtrees

Merge another repo's code directly into your tree:

# Add a subtree
git subtree add --prefix=lib/shared https://github.com/org/shared-lib.git main --squash

# Pull updates
git subtree pull --prefix=lib/shared https://github.com/org/shared-lib.git main --squash

# Push changes back upstream
git subtree push --prefix=lib/shared https://github.com/org/shared-lib.git main

Subtrees are simpler for consumers (no submodule init needed, code is right there), but harder to push changes upstream. Choose subtrees when the dependency is mostly read-only.

Hooks

Scripts that run at specific points in the Git workflow. Stored in .git/hooks/ (local, not committed) or managed via tools like Husky, pre-commit, or lefthook.

Hook When It Runs Common Use
pre-commit Before commit is created Lint, format, run fast tests
prepare-commit-msg After default message, before editor Add ticket number from branch name
commit-msg After message is written Enforce conventional commits format
pre-push Before push to remote Run full test suite
post-merge After a merge completes Reinstall dependencies if lockfile changed
pre-rebase Before rebase starts Warn if rebasing published branch

Example pre-commit hook:

#!/bin/bash
# .git/hooks/pre-commit
# Run linter on staged files only
STAGED=$(git diff --cached --name-only --diff-filter=ACM -- '*.py')
if [ -n "$STAGED" ]; then
    ruff check $STAGED || exit 1
fi

Example commit-msg hook enforcing conventional commits:

#!/bin/bash
# .git/hooks/commit-msg
MSG=$(head -1 "$1")
if ! echo "$MSG" | grep -qE '^(feat|fix|docs|chore|refactor|test|ci|style|perf|build)(\(.+\))?: .{1,72}$'; then
    echo "ERROR: Commit message must follow Conventional Commits format"
    echo "Example: feat(auth): add JWT token refresh"
    exit 1
fi

.gitattributes

Controls per-path settings for merge, diff, line endings, and LFS:

# Force LF line endings for these files (even on Windows)
*.sh    text eol=lf
*.py    text eol=lf

# Treat as binary (no diff, no merge)
*.png   binary
*.jar   binary

# Custom merge driver for lockfiles (always use ours on conflict)
package-lock.json merge=ours

# Use Git LFS for large files
*.psd   filter=lfs diff=lfs merge=lfs -text
*.zip   filter=lfs diff=lfs merge=lfs -text

# Custom diff driver for minified files
*.min.js diff=minified

Setting up the merge=ours driver:

git config merge.ours.driver true

.gitignore Patterns

# Standard ignores
*.pyc
__pycache__/
.env
.env.*
node_modules/

# Negate (un-ignore) a specific file
!.env.example

# Ignore directory anywhere in tree
**/build/

# Ignore only at repo root
/dist/

# Ignore files in a directory but not the directory itself
logs/*
!logs/.gitkeep

# Character ranges
*.[oa]          # .o and .a files
# Check why a file is ignored
git check-ignore -v path/to/file
# .gitignore:3:*.pyc    path/to/file.pyc

# List all ignored files
git ls-files --ignored --exclude-standard

Sparse Checkout and Partial Clone

For monorepos where you only need a subset of the tree:

# Partial clone — download only the commits, not all blobs
git clone --filter=blob:none https://github.com/org/monorepo.git
# Blobs are fetched on demand when you checkout files

# Treeless clone — even less initial data
git clone --filter=tree:0 https://github.com/org/monorepo.git

# Enable sparse checkout
git sparse-checkout init --cone

# Check out only specific directories
git sparse-checkout set services/auth services/gateway shared/

# List what's checked out
git sparse-checkout list

# Disable sparse checkout (get everything back)
git sparse-checkout disable

Advanced Log

# Graph view with decoration
git log --graph --oneline --all --decorate

# Commits by author
git log --author="jane@example.com" --oneline

# Commits in date range
git log --since="2025-01-01" --until="2025-03-01" --oneline

# Search commit messages
git log --grep="auth" --oneline

# Pickaxe: find commits that added or removed a string
git log -S "API_KEY" --oneline
# Shows commits where "API_KEY" appears in the diff

# Regex search in diffs
git log -G "def (create|update)_user" --oneline

# Follow file renames
git log --follow -- src/auth/middleware.py

# Show files changed in each commit
git log --stat --oneline

# Show only merge commits
git log --merges --oneline

# Show only non-merge commits
git log --no-merges --oneline

# Commits reachable from feature but not from main
git log main..feature/auth --oneline

# Commits in either branch but not both
git log main...feature/auth --oneline

Advanced Diff

# Word-level diff (not line-level)
git diff --word-diff

# Show changed files only (summary)
git diff --stat
git diff --name-only
git diff --name-status

# Diff staged changes
git diff --cached

# Diff between branches
git diff main..feature/auth

# Diff a specific file between commits
git diff abc123..def456 -- src/auth.py

# Show diff stats between branches
git diff --stat main..feature/auth

# Ignore whitespace changes
git diff -w

Blame and Annotate

# Who changed each line (and when)
git blame src/auth/middleware.py

# Blame a specific line range
git blame -L 20,40 src/auth/middleware.py

# Blame ignoring whitespace changes
git blame -w src/auth/middleware.py

# Show the commit that last moved/copied these lines
git blame -C src/auth/middleware.py

# Show email instead of name
git blame -e src/auth/middleware.py

Combining blame with log to trace a line's full history:

# Find who introduced a specific line
git log -S "max_retries = 3" --oneline -- src/client.py

git rev-parse

The Swiss army knife for resolving references:

# Current commit SHA
git rev-parse HEAD

# Short SHA
git rev-parse --short HEAD

# Repository root directory
git rev-parse --show-toplevel

# Is this inside a git repo?
git rev-parse --is-inside-work-tree

# Resolve a tag to a commit
git rev-parse v1.2.0^{commit}

# Parent of HEAD
git rev-parse HEAD~1

# Second parent of a merge commit
git rev-parse HEAD^2

Signing Commits

GPG Signing

# Configure GPG key
git config --global user.signingkey ABC123DEF456
git config --global commit.gpgsign true

# Sign a single commit
git commit -S -m "feat: verified change"

# Verify a commit signature
git verify-commit HEAD

# Show signatures in log
git log --show-signature

SSH Signing (Git 2.34+)

# Configure SSH signing
git config --global gpg.format ssh
git config --global user.signingkey ~/.ssh/id_ed25519.pub
git config --global commit.gpgsign true

# Allowed signers file (for verification)
git config --global gpg.ssh.allowedSignersFile ~/.config/git/allowed_signers

# Format: email key-type key-data
# ~/.config/git/allowed_signers
# jane@example.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5...

Maintenance and GC

# Run garbage collection
git gc

# Aggressive GC (repack everything, slower)
git gc --aggressive

# See what GC would remove
git fsck --unreachable

# Prune unreachable objects older than 2 weeks (default)
git prune --expire=2.weeks.ago

# Enable background maintenance (Git 2.29+)
git maintenance start
# Runs prefetch, loose-objects, incremental-repack on schedule

# Check repo health
git fsck --full

# See pack file stats
git count-objects -v

Packfiles

Git periodically packs loose objects into packfiles for efficiency:

# See current packs
ls .git/objects/pack/

# Repack manually
git repack -a -d

# Verify pack integrity
git verify-pack -v .git/objects/pack/pack-*.idx | head -20

Quick Reference

Task Command
Object type git cat-file -t <sha>
Interactive rebase git rebase -i HEAD~N
Cherry-pick git cherry-pick <sha>
Reflog git reflog
Bisect git bisect start && git bisect bad && git bisect good <sha>
Stash with message git stash push -m "msg"
New worktree git worktree add <path> <branch>
Pickaxe search git log -S "string"
Word diff git diff --word-diff
Blame line range git blame -L 20,40 file
Sign commit git commit -S -m "msg"
Sparse checkout git sparse-checkout set dir1/ dir2/
Partial clone git clone --filter=blob:none <url>
Repo root git rev-parse --show-toplevel

Git Recovery — What Experienced People Know

  • Run git reflog before panicking. Your work is almost certainly still there.
  • git reset --hard is the only common command that loses uncommitted work. Everything else recovers.
  • ORIG_HEAD is set by rebase, merge, and reset. git reset --hard ORIG_HEAD is instant undo.
  • Commit early, commit often. Uncommitted work is the only work git cannot recover.
  • Never force-push to shared branches. Use --force-with-lease as a safety net.
  • Before any risky operation, create a backup branch: git branch backup-before-rebase. Costs nothing.
  • git log --all --oneline --graph --decorate shows the full commit graph with branch pointers.
  • The garbage collector prunes unreachable objects after 90 days. Until then, everything recoverable.
  • git bisect run with an automated test finds regressions in log2(N) steps.

Wiki Navigation

Prerequisites