Git Internals: The Content-Addressable Filesystem

lesson
git-object-model
content-addressable-storage
sha-1/sha-256
plumbing-commands ---# Git Internals — The Content-Addressable Filesystem

Topics: Git object model, content-addressable storage, SHA-1/SHA-256, plumbing commands, packfiles, reflog, merge internals, rebase internals, garbage collection, DAG model, worktrees Level: L1–L2 (Foundations → Operations) Time: 75–90 minutes Prerequisites: None — we start from an empty directory

The Mission¶

Build a Git commit from scratch. Not with git commit — with the raw plumbing commands that git commit calls under the hood. You'll create a blob, assemble a tree, forge a commit object, and watch it appear in git log without ever touching a porcelain command.

By the end, the .git directory stops being a black box. You'll know exactly where your data lives, why git reflog can save you after a catastrophic rebase, how packfiles shrink a 4 GB Linux kernel repo to something your laptop can clone, and what actually happens during a three-way merge.

The payoff: when something goes wrong — and it will — you won't be guessing. You'll know where the data is and how to get it back.

Name Origin: Linus Torvalds created Git in approximately ten days in April 2005 after BitKeeper revoked its free license for the Linux kernel. The name "git" is British slang for an unpleasant, difficult person. Linus said: "I'm an egotistical bastard, and I name all my projects after myself. First Linux, now git." He also offered a backronym: "Global Information Tracker" when it works, "Conditions Damn Information Tracker" when it doesn't. — verified via the Git FAQ and Linus's original README in the Git source tree.

Part 1: The Four Objects¶

Everything in Git is an object. Four types. That's it.

Object	What it stores	How to think about it
blob	Raw file contents — no filename, no permissions	A file without a name
tree	Directory listing: maps names → blobs and subtrees	A directory without a path
commit	Pointer to a tree + parent(s) + author + message	A snapshot with context
tag	Pointer to a commit + tagger + message + optional GPG sig	An annotated bookmark

Every object is identified by the SHA-1 hash of its contents. Two files with identical bytes produce the same blob hash — Git stores them once. This is content-addressable storage: the address (hash) is derived from the content, not assigned by a counter or a timestamp.

# Prove it: identical content → identical hash
echo "hello" | git hash-object --stdin
# → ce013625030ba8dba906f756967f9e9ca394464a

echo "hello" | git hash-object --stdin
# → ce013625030ba8dba906f756967f9e9ca394464a  (same hash, always)

echo "Hello" | git hash-object --stdin
# → e965047ad7c57865823c7d992b1d046ea66edf78  (different — case matters)

Under the Hood: The hash is computed as SHA-1("blob <size>\0<content>"). The object type and byte length are prepended, separated by a space, then a null byte, then the raw content. This header prevents collisions between a blob and a tree that happen to contain the same bytes. You can verify this with printf 'blob 6\0hello\n' | sha1sum.

Trivia: Git uses SHA-1 for integrity checking, not for cryptographic security. After Google demonstrated a practical SHA-1 collision in 2017 (the SHAttered attack), the Git project began a slow transition to SHA-256. As of Git 2.42+, you can create a SHA-256 repository with git init --object-format=sha256, but the ecosystem (GitHub, GitLab, CI tools) still overwhelmingly uses SHA-1. The migration is happening — slowly.

Part 2: The .git Directory Tour¶

Run git init and look at what you get. Every file has a purpose.

mkdir /tmp/git-lab && cd /tmp/git-lab
git init
find .git -maxdepth 2 -type f | sort

.git/HEAD
.git/config
.git/description
.git/hooks/applypatch-msg.sample
.git/hooks/commit-msg.sample
.git/hooks/pre-commit.sample
.git/hooks/pre-push.sample
...
.git/info/exclude

Here's the map:

Path	Purpose
`HEAD`	Points to the current branch (e.g., `ref: refs/heads/main`)
`objects/`	Every blob, tree, commit, and tag — the entire content store
`refs/heads/`	Branch tips — each file contains one commit SHA
`refs/tags/`	Tag pointers
`refs/remotes/`	Remote tracking branches (after first fetch)
`index`	The staging area — a binary file mapping paths to blob SHAs
`config`	Repository-level configuration
`hooks/`	Client-side hook scripts (pre-commit, pre-push, etc.)
`info/exclude`	Personal ignore rules (like `.gitignore` but not committed)
`description`	Used by GitWeb — you can ignore this
`packed-refs`	Compact storage for refs after `git gc`
`logs/`	Reflog entries — the safety net

Mental Model: A Git repository is a key-value store (objects/) plus a set of named pointers (refs/). Everything else — branches, tags, HEAD, the staging area — is just metadata that tells Git where to start reading the object graph. When someone says "I deleted my branch," they deleted a 41-byte pointer file. The commit objects are still there.

Part 3: Building a Commit by Hand¶

This is the core exercise. You'll use Git's plumbing commands — the low-level tools that the friendly porcelain commands (git add, git commit) call internally.

Step 1: Create a blob¶

# Write content into the object store
echo "# My Project" | git hash-object -w --stdin
# → 7b6d3dfc10d0acfb6ce6e5b07e4a40e2e1c8a0c3

Flag	What it does
`-w`	Actually write the object (without it, just prints the hash)
`--stdin`	Read from stdin instead of a file

Verify the object exists:

# The hash is split: first 2 chars = directory, rest = filename
ls .git/objects/7b/
# → 6d3dfc10d0acfb6ce6e5b07e4a40e2e1c8a0c3

# Read it back
git cat-file -p 7b6d3dfc10d0acfb6ce6e5b07e4a40e2e1c8a0c3
# → # My Project

# Check the type
git cat-file -t 7b6d3dfc10d0acfb6ce6e5b07e4a40e2e1c8a0c3
# → blob

Under the Hood: The object file is zlib-compressed. Git doesn't store raw bytes on disk — it compresses them first. That's why cat .git/objects/7b/6d3d... gives you garbage. You must use git cat-file to read objects, because it handles the decompression and header parsing.

Step 2: Build a tree¶

A tree maps filenames to blob hashes. You can't just write a tree from scratch easily — you need to populate the index first, then write it.

# Create a file and add it to the index
echo "# My Project" > README.md
git update-index --add --cacheinfo 100644 \
  $(git hash-object -w README.md) README.md

# Create another file
echo "print('hello')" > main.py
git update-index --add --cacheinfo 100644 \
  $(git hash-object -w main.py) main.py

# Write the index as a tree object
TREE=$(git write-tree)
echo "Tree hash: $TREE"

Now inspect the tree:

git cat-file -p $TREE
# → 100644 blob <hash>    README.md
# → 100644 blob <hash>    main.py

File mode	Meaning
`100644`	Regular file
`100755`	Executable file
`040000`	Subdirectory (tree)
`120000`	Symbolic link
`160000`	Submodule (gitlink)

Step 3: Create a commit¶

COMMIT=$(echo "Initial commit — built by hand" | \
  git commit-tree $TREE)
echo "Commit hash: $COMMIT"

Inspect it:

git cat-file -p $COMMIT
# → tree <tree-hash>
# → author Your Name <you@example.com> 1711108800 +0000
# → committer Your Name <you@example.com> 1711108800 +0000
# →
# → Initial commit — built by hand

No parent — this is a root commit. To make it visible to git log:

# Point main at our hand-built commit
git update-ref refs/heads/main $COMMIT

# Now git log works
git log --oneline
# → abc1234 Initial commit — built by hand

You just did what git add + git commit does — in four explicit steps: 1. git hash-object -w → created blobs 2. git update-index → populated the staging area 3. git write-tree → turned the staging area into a tree object 4. git commit-tree → wrapped the tree in a commit

Interview Bridge: "Walk me through what happens when you run git commit" is a classic interview question. The answer is exactly what you just did: stage → write-tree → commit-tree → update branch ref. Knowing the plumbing commands proves you understand the abstraction, not just the CLI.

Flashcard Check — Part 1¶

Question	Answer
What are the four Git object types?	Blob, tree, commit, tag
What does a blob store?	Raw file contents — no filename, no permissions
Where does a branch pointer live on disk?	`.git/refs/heads/<branch-name>` (a text file with one SHA)
What does `git write-tree` do?	Writes the current index (staging area) as a tree object
Why is identical file content stored only once?	Content-addressable storage: the hash IS the address, so identical bytes produce the same hash
What is the hash input format for a blob?	`blob <byte-length>\0<content>` — hashed with SHA-1

Part 4: The DAG — How Commits Connect¶

Git's commit history is a directed acyclic graph (DAG). Each commit points to its parent(s). The graph can branch and merge but never cycle — you can't be your own ancestor.

A simple linear history:

    A ← B ← C ← D  (main)

A branch and merge:

    A ← B ← C ← F  (main)
             ↑   ↑
             D ← E  (feature)

An octopus merge (3+ parents):

    A ← B ← G  (main)
         ↑   ↑
    C ←──┘   |
    D ←──────┘

Every arrow means "this commit's parent field points to that commit." A merge commit has two (or more) parents. A root commit has zero.

Mental Model: Think of the DAG as a river system. Water flows downstream (parent → child). Branches are tributaries. Merges are confluences. Tags are signs posted on the riverbank. git log walks upstream from wherever you're standing. git reflog is a security camera that recorded everywhere you've stood.

Why this matters: when you "lose" a commit, it still exists in the DAG. You just don't have a ref (branch, tag, HEAD) pointing to it. The reflog remembers where every ref used to point, which is how you find your way back.

Part 5: How Merge Actually Works¶

When you run git merge feature, Git doesn't "combine two branches." It follows a specific algorithm.

Fast-forward merge¶

If main hasn't moved since feature branched off, Git just moves the pointer:

Before:
    A ← B  (main)
         ← C ← D  (feature)

After fast-forward:
    A ← B ← C ← D  (main, feature)

No new commit. The branch pointer slides forward. This is why fast-forward merges produce linear history — nothing actually merged.

Three-way merge¶

If both branches have new commits, Git needs to merge. The algorithm:

Find the merge base — the most recent common ancestor of both branch tips. Git uses git merge-base main feature to find this.
Diff each branch against the base — what did main change? What did feature change?
Combine the diffs — if both sides changed the same lines differently, that's a conflict. Otherwise, apply both sets of changes cleanly.
Create a merge commit with two parents.

# Find the merge base yourself
git merge-base main feature
# → abc123  (the common ancestor)

# See what each branch changed relative to the base
git diff abc123..main --stat
git diff abc123..feature --stat

Under the Hood: The default merge strategy in modern Git is ort (Ostensibly Recursive's Twin), which replaced the recursive strategy in Git 2.34. For merges with a single common ancestor, ort and recursive produce identical results. The difference shows up with criss-cross merges — situations where there are multiple possible merge bases. ort handles these faster and more correctly. The octopus strategy handles merges of 3+ branches and is what the Linux kernel uses for subsystem pull requests.

# See which strategy Git would use
git merge --no-commit --no-ff feature
# The reflog entry will show the strategy used

# Force a specific strategy
git merge -s ort feature
git merge -s recursive feature  # legacy, still works

Gotcha: A merge commit doesn't mean there was a conflict. It means the histories diverged and Git created a commit with two parents to join them. A clean merge (no conflicts) still creates a merge commit if you use --no-ff. Many teams require --no-ff merges so the branch topology is always visible in git log --graph.

Part 6: Rebase — The Cherry-Pick Chain¶

Rebase sounds exotic, but under the hood it's simple: cherry-pick each commit, one at a time, onto a new base.

Before rebase:
    A ← B ← E ← F  (main)
         ← C ← D  (feature)

After: git checkout feature && git rebase main
    A ← B ← E ← F  (main)
                  ← C' ← D'  (feature)

C' and D' are new commits. They have the same diffs as C and D, but different SHAs because their parent changed. Here's what Git actually does:

Find the merge base (B)
Collect the commits to replay (C, D)
Reset feature to main's tip (F)
Cherry-pick C onto F → creates C'
Cherry-pick D onto C' → creates D'
Move the feature ref to D'

The original C and D still exist — they're just unreachable (no ref points to them). The reflog remembers them for 30 days (unreachable) to 90 days (reachable).

Remember: Rebase rewrites history. Every replayed commit gets a new SHA. If someone else has based work on the original commits, their history diverges from yours. This is why the golden rule exists: never rebase commits that have been pushed to a shared branch. On your own local branch? Rebase all you want.

Interactive rebase — the editor view¶

git rebase -i HEAD~4

pick a1b2c3d feat: add user model
pick d4e5f6a wip: debugging auth
pick 7a8b9c0 fix: auth actually works now
pick 1d2e3f4 feat: add user endpoints

The commands (pick, squash, fixup, reword, drop, edit) each translate to a cherry-pick variant. squash cherry-picks and amends into the previous commit. drop skips the cherry-pick entirely. It's all cherry-picks under the hood.

Flashcard Check — Part 2¶

Question	Answer
What is the merge base?	The most recent common ancestor of two branch tips
What does `git merge-base A B` return?	The SHA of the common ancestor commit
What merge strategy does modern Git default to?	`ort` (replaced `recursive` in Git 2.34)
What does rebase actually do under the hood?	Cherry-picks each commit onto the new base, creating new commits with new SHAs
Why does rebase change commit hashes?	Because the parent changed — the hash includes the parent SHA
When is a fast-forward merge possible?	When the target branch has no new commits since the source branched off

Part 7: Packfiles and Deltification¶

Fresh objects are stored as individual "loose" files in .git/objects/. This works, but it's wasteful — you might have 100 versions of a file that differ by one line each, and each version is stored as a complete compressed blob.

Git solves this with packfiles. When you run git gc, push, or clone, Git packs loose objects into a .pack file with an accompanying .idx index.

# See your current pack situation
git count-objects -v
# → count: 42          (loose objects)
# → packs: 1           (packfiles)
# → size-pack: 1234    (packfile size in KB)

# Force a repack
git gc
ls .git/objects/pack/
# → pack-abc123.idx
# → pack-abc123.pack

Inside a packfile, Git uses deltification — storing similar objects as a base object plus a binary delta. The algorithm:

Sort objects by type, then by filename, then by size
For each object, try to express it as a delta of a nearby object
Keep the smallest representation (full object or delta)

The result: a 4 GB Linux kernel repository with 1.2 million commits compresses into a packfile small enough to clone over a residential internet connection.

# Peek inside a packfile
git verify-pack -v .git/objects/pack/pack-*.idx | head -20
# Shows each object, its type, size, and whether it's a delta

# Output columns: SHA type size size-in-pack offset [base-SHA delta-depth]
# Objects with a base-SHA are stored as deltas

Trivia: Deltification works across file history AND across different files. If README.md and CONTRIBUTING.md share most of their content, Git might store one as a delta of the other — even though they're different files. The packing algorithm doesn't care about filenames; it cares about byte similarity.

Under the Hood: When you git clone, the server runs git pack-objects to create a custom packfile containing exactly the objects you need. This is the "Receiving objects" and "Resolving deltas" progress you see. The server-side pack is generated on the fly — it's not a pre-built file sitting on disk.

Part 8: The Reflog as Safety Net¶

The reflog is a local, per-ref log of every time a ref (HEAD, branch tip) changes position. It's stored in .git/logs/.

git reflog
# → abc1234 HEAD@{0}: commit: add authentication
# → def5678 HEAD@{1}: rebase (finish): returning to refs/heads/main
# → 789abcd HEAD@{2}: checkout: moving from feature to main
# → fedcba9 HEAD@{3}: commit: wip — still debugging

Every entry records: the new SHA, the old SHA, the operation, and a timestamp. This is how you recover from almost anything:

# Undo a bad rebase
git reset --hard HEAD@{3}

# Recover a deleted branch
git reflog | grep "feature/auth"
git checkout -b feature/auth abc1234

# See reflog for a specific branch
git reflog show main

Gotcha: The reflog is local only. It is never pushed or shared. If you lose commits on your machine, the reflog on your machine is the only one that can help. If you lose commits on a remote, you need to find someone whose local clone (or reflog) still has them.

Expiry: Reflog entries for reachable commits (still pointed to by a branch or tag) expire after 90 days. Entries for unreachable commits expire after 30 days. After expiry, git gc can prune the objects they reference.

# Check expiry settings
git config gc.reflogExpire        # default: 90 days
git config gc.reflogExpireUnreachable  # default: 30 days

# Never expire (for a critical repo you want maximum safety)
git config gc.reflogExpire never
git config gc.reflogExpireUnreachable never

Part 9: Garbage Collection and git fsck¶

Git accumulates cruft — unreachable objects from rebases, amended commits, abandoned branches. Garbage collection cleans this up.

# Run GC (safe — respects reflog expiry)
git gc

# See what would be cleaned up
git fsck --unreachable
# → unreachable blob abc123...
# → unreachable commit def456...

# Aggressive GC — repacks everything, slower but smaller
git gc --aggressive

# DANGER: prune everything unreachable RIGHT NOW (ignores reflog)
git gc --prune=now
# Only use this if you're SURE you don't need to recover anything

git fsck (file system check) validates the integrity of every object. It can find:

Dangling objects — blobs/commits not reachable from any ref
Missing objects — referenced but not in the object store (corruption)
Broken links — a commit points to a tree that doesn't exist

# Full integrity check
git fsck --full

# Find dangling commits (useful after a bad rebase)
git fsck --lost-found
ls .git/lost-found/commit/
# These are commits that no branch points to — potentially recoverable work

War Story: In 2020, a kernel developer accidentally pushed a commit that broke the build across multiple architectures. Using git bisect, the team identified the exact breaking commit within hours, reverted it, and the kernel's integrity was maintained. git fsck was used to verify repository integrity after the recovery. The process worked exactly as Linus had designed it fifteen years earlier. — verified via the linux-kernel mailing list archives.

Part 10: Worktrees — Parallel Working Directories¶

Sometimes you need to work on two branches simultaneously — a hotfix and a feature, or reviewing a PR while your own branch is mid-rebase. Stashing works, but worktrees are cleaner.

A worktree is a second (or third, or fourth) working directory that shares the same .git object store. Same repo, different checkout.

# Create a worktree for a hotfix
git worktree add ../my-project-hotfix hotfix/urgent-fix

# List active worktrees
git worktree list
# → /home/dev/my-project          abc1234 [main]
# → /home/dev/my-project-hotfix   def5678 [hotfix/urgent-fix]

# Work in the hotfix worktree
cd ../my-project-hotfix
# edit, commit, push — then come back

# Clean up when done
git worktree remove ../my-project-hotfix

Why this matters: worktrees share the object store. A commit made in one worktree is immediately visible (as an object) in the other. No duplicate downloads, no separate clones.

Gotcha: You can't check out the same branch in two worktrees simultaneously. Git prevents this because two working directories modifying the same branch ref would corrupt the state. If you need to test the same branch in two places, create a temporary branch in one worktree.

Flashcard Check — Part 3¶

Question	Answer
What triggers Git to create packfiles?	`git gc`, `git push`, `git clone`, or `git repack`
What is deltification?	Storing similar objects as a base object plus a binary diff (delta)
How long do reflog entries last for unreachable commits?	30 days by default (`gc.reflogExpireUnreachable`)
What does `git fsck --lost-found` do?	Finds dangling commits/blobs and copies them to `.git/lost-found/`
What does `git gc --prune=now` do that `git gc` doesn't?	Immediately prunes all unreachable objects, ignoring the default 2-week grace period
Can two worktrees check out the same branch?	No — Git prevents this to avoid ref corruption

Part 11: The Index (Staging Area) — Deeper Than You Think¶

The index (.git/index) is a binary file that sits between your working directory and the object store. When you run git add, you're writing to the index. When you run git commit, Git reads the index.

# See what's in the index
git ls-files --stage
# → 100644 abc123def456... 0    README.md
# → 100644 789abcdef012... 0    src/main.py

The columns: file mode, blob SHA, stage number, path.

The stage number is normally 0. During a merge conflict, Git writes three versions:

Stage	What it is
1	Common ancestor version
2	"Ours" (current branch)
3	"Theirs" (branch being merged)

# During a conflict, see all three stages
git ls-files --stage --unmerged
# → 100644 abc123... 1    config.yaml  (ancestor)
# → 100644 def456... 2    config.yaml  (ours)
# → 100644 789abc... 3    config.yaml  (theirs)

This is the raw data that git mergetool presents visually. Understanding it means you can resolve conflicts with plumbing commands when the porcelain tools fail.

Trivia: The staging area was Linus Torvalds's most controversial design decision. Many developers coming from SVN found it confusing and unnecessary — why not just commit everything that changed? But the index enables git add -p (partial staging), which lets you split a messy working directory into clean, logical commits. Today it's considered one of Git's best features. The controversy took years to settle.

Part 12: Putting It All Together — The Full Commit Lifecycle¶

Here's everything that happens when you type git commit -m "fix auth bug":

1. git commit reads .git/index (the staging area)
2. Creates a tree object from the index → writes to .git/objects/
3. Creates a commit object:
   - Points to the tree from step 2
   - Points to the current HEAD as parent
   - Records author, committer, timestamp, message
   - Writes to .git/objects/
4. Updates .git/refs/heads/<branch> to point to the new commit
5. Updates .git/HEAD (if it changed — usually it didn't)
6. Appends an entry to .git/logs/HEAD (the reflog)
7. Runs post-commit hooks (if any)

That's it. Seven steps. No network calls, no server, no locking. This is why Git is fast — a commit is just writing a couple of small files to disk and updating a pointer.

Mental Model: Every Git operation is either (a) writing objects to the store, (b) updating a ref to point to a different object, or (c) both. git add writes blobs. git commit writes a tree and a commit, then updates a ref. git branch creates a ref. git tag creates a tag object and a ref. git merge writes a commit with two parents and updates a ref. Once you see this pattern, Git stops being a collection of 150 commands and becomes variations on two operations.

Exercises¶

Exercise 1: Build a commit from scratch (5 minutes)¶

In an empty repository, create a commit using only plumbing commands. No git add, no git commit.

Hint

The sequence: `git hash-object -w` → `git update-index` → `git write-tree` → `git commit-tree` → `git update-ref`.

Solution

mkdir /tmp/plumbing-lab && cd /tmp/plumbing-lab && git init
echo "Hello from plumbing" > hello.txt
BLOB=$(git hash-object -w hello.txt)
git update-index --add --cacheinfo 100644 $BLOB hello.txt
TREE=$(git write-tree)
COMMIT=$(echo "Hand-built commit" | git commit-tree $TREE)
git update-ref refs/heads/main $COMMIT
git log --oneline

Exercise 2: Trace a merge base (10 minutes)¶

Create a repo with two divergent branches. Find the merge base manually using git merge-base. Then verify by looking at git log --graph --all --oneline.

Hint

Create a few commits on `main`, branch, add commits to both `main` and the branch, then use `git merge-base main feature`.

Exercise 3: Recover a "deleted" branch (5 minutes)¶

Create a branch, make two commits on it, delete the branch with git branch -D, then recover it using only the reflog.

Solution

git checkout -b experiment
echo "experiment 1" > exp.txt && git add exp.txt && git commit -m "exp commit 1"
echo "experiment 2" >> exp.txt && git add exp.txt && git commit -m "exp commit 2"
git checkout main
git branch -D experiment
# "Deleted branch experiment (was abc1234)."

# Recover:
git reflog | grep experiment
# Or just use the SHA from the deletion message:
git checkout -b experiment abc1234
git log --oneline experiment

Exercise 4: Inspect a packfile (10 minutes)¶

Run git gc on a repository with at least 20 commits. Then use git verify-pack -v to find a deltified object. Use git cat-file -p to read it and confirm Git reconstructs the full content transparently.

Hint

In `git verify-pack` output, deltified objects have a sixth column (the base object SHA) and a seventh column (delta chain depth). Pick one and `git cat-file -p` it.

Exercise 5: The conflict stage numbers (15 minutes)¶

Create a merge conflict intentionally. While the conflict is active (before resolving), run git ls-files --stage --unmerged and identify the ancestor, ours, and theirs versions. Use git cat-file -p to read each one.

Cheat Sheet¶

Task	Command
Hash content without storing	`echo "text" \\| git hash-object --stdin`
Hash and store	`echo "text" \\| git hash-object -w --stdin`
Read any object	`git cat-file -p <sha>`
Object type	`git cat-file -t <sha>`
Object size	`git cat-file -s <sha>`
Write staging area as tree	`git write-tree`
Create commit from tree	`echo "msg" \\| git commit-tree <tree> -p <parent>`
Update a branch ref	`git update-ref refs/heads/<branch> <sha>`
Show staging area contents	`git ls-files --stage`
Find merge base	`git merge-base <branch1> <branch2>`
Show all reflog entries	`git reflog`
Reflog for specific branch	`git reflog show <branch>`
Find lost objects	`git fsck --lost-found`
Repository integrity check	`git fsck --full`
Pack statistics	`git count-objects -v`
Inspect packfile contents	`git verify-pack -v .git/objects/pack/pack-*.idx`
Create worktree	`git worktree add <path> <branch>`
Force repack	`git repack -a -d`

Takeaways¶

Git is four object types and a pile of pointers. Blobs, trees, commits, tags — stored by their content hash. Branches, HEAD, and tags are just named pointers into this graph. Everything else is variations on "write objects, move pointers."
Content-addressable storage means deduplication is free. Identical files across a thousand commits are stored once. This is why Git repos are smaller than you'd expect.
The reflog is your 30–90 day safety net. Almost nothing is truly lost until garbage collection runs. Make git reflog your first command after any disaster.
Packfiles are why Git scales. Deltification compresses similar objects (even across different files) into compact binary diffs. A million-commit repo fits on your laptop.
Merge finds a common ancestor, then combines two diffs. The three-way merge algorithm is the same whether Git uses recursive, ort, or octopus. Understanding the merge base is understanding the merge.
Rebase is just cherry-pick in a loop. Every replayed commit gets a new SHA because its parent changed. The originals survive in the reflog. Never rebase shared history.

The Git Disaster Recovery Guide — reflog rescues, force push recovery, and bisect for bug hunting
What Happens When You git push to CI — follows a push through pack negotiation, SSH transport, and CI triggers
GitOps — The Repo Is the Truth — using Git as the single source of truth for infrastructure state

Git Internals: The Content-Addressable Filesystem

The Mission¶

Part 1: The Four Objects¶

Part 2: The .git Directory Tour¶

Part 3: Building a Commit by Hand¶

Step 1: Create a blob¶

Step 2: Build a tree¶

Step 3: Create a commit¶

Flashcard Check — Part 1¶

Part 4: The DAG — How Commits Connect¶

Part 5: How Merge Actually Works¶

Fast-forward merge¶

Three-way merge¶

Part 6: Rebase — The Cherry-Pick Chain¶

Interactive rebase — the editor view¶

Flashcard Check — Part 2¶

Part 7: Packfiles and Deltification¶

Part 8: The Reflog as Safety Net¶

Part 9: Garbage Collection and git fsck¶

Part 10: Worktrees — Parallel Working Directories¶

Flashcard Check — Part 3¶

Part 11: The Index (Staging Area) — Deeper Than You Think¶

Part 12: Putting It All Together — The Full Commit Lifecycle¶

Exercises¶

Exercise 1: Build a commit from scratch (5 minutes)¶

Exercise 2: Trace a merge base (10 minutes)¶

Exercise 3: Recover a "deleted" branch (5 minutes)¶

Exercise 4: Inspect a packfile (10 minutes)¶

Exercise 5: The conflict stage numbers (15 minutes)¶

Cheat Sheet¶

Takeaways¶

Related Lessons¶

Pages that link here¶