Git Internals: The Content-Addressable Filesystem
- lesson
- git-object-model
- content-addressable-storage
- sha-1/sha-256
- plumbing-commands ---# Git Internals — The Content-Addressable Filesystem
Topics: Git object model, content-addressable storage, SHA-1/SHA-256, plumbing commands, packfiles, reflog, merge internals, rebase internals, garbage collection, DAG model, worktrees Level: L1–L2 (Foundations → Operations) Time: 75–90 minutes Prerequisites: None — we start from an empty directory
The Mission¶
Build a Git commit from scratch. Not with git commit — with the raw plumbing commands
that git commit calls under the hood. You'll create a blob, assemble a tree, forge a
commit object, and watch it appear in git log without ever touching a porcelain command.
By the end, the .git directory stops being a black box. You'll know exactly where your
data lives, why git reflog can save you after a catastrophic rebase, how packfiles shrink
a 4 GB Linux kernel repo to something your laptop can clone, and what actually happens during
a three-way merge.
The payoff: when something goes wrong — and it will — you won't be guessing. You'll know where the data is and how to get it back.
Name Origin: Linus Torvalds created Git in approximately ten days in April 2005 after BitKeeper revoked its free license for the Linux kernel. The name "git" is British slang for an unpleasant, difficult person. Linus said: "I'm an egotistical bastard, and I name all my projects after myself. First Linux, now git." He also offered a backronym: "Global Information Tracker" when it works, "Conditions Damn Information Tracker" when it doesn't. — verified via the Git FAQ and Linus's original README in the Git source tree.
Part 1: The Four Objects¶
Everything in Git is an object. Four types. That's it.
| Object | What it stores | How to think about it |
|---|---|---|
| blob | Raw file contents — no filename, no permissions | A file without a name |
| tree | Directory listing: maps names → blobs and subtrees | A directory without a path |
| commit | Pointer to a tree + parent(s) + author + message | A snapshot with context |
| tag | Pointer to a commit + tagger + message + optional GPG sig | An annotated bookmark |
Every object is identified by the SHA-1 hash of its contents. Two files with identical bytes produce the same blob hash — Git stores them once. This is content-addressable storage: the address (hash) is derived from the content, not assigned by a counter or a timestamp.
# Prove it: identical content → identical hash
echo "hello" | git hash-object --stdin
# → ce013625030ba8dba906f756967f9e9ca394464a
echo "hello" | git hash-object --stdin
# → ce013625030ba8dba906f756967f9e9ca394464a (same hash, always)
echo "Hello" | git hash-object --stdin
# → e965047ad7c57865823c7d992b1d046ea66edf78 (different — case matters)
Under the Hood: The hash is computed as
SHA-1("blob <size>\0<content>"). The object type and byte length are prepended, separated by a space, then a null byte, then the raw content. This header prevents collisions between a blob and a tree that happen to contain the same bytes. You can verify this withprintf 'blob 6\0hello\n' | sha1sum.Trivia: Git uses SHA-1 for integrity checking, not for cryptographic security. After Google demonstrated a practical SHA-1 collision in 2017 (the SHAttered attack), the Git project began a slow transition to SHA-256. As of Git 2.42+, you can create a SHA-256 repository with
git init --object-format=sha256, but the ecosystem (GitHub, GitLab, CI tools) still overwhelmingly uses SHA-1. The migration is happening — slowly.
Part 2: The .git Directory Tour¶
Run git init and look at what you get. Every file has a purpose.
.git/HEAD
.git/config
.git/description
.git/hooks/applypatch-msg.sample
.git/hooks/commit-msg.sample
.git/hooks/pre-commit.sample
.git/hooks/pre-push.sample
...
.git/info/exclude
Here's the map:
| Path | Purpose |
|---|---|
HEAD |
Points to the current branch (e.g., ref: refs/heads/main) |
objects/ |
Every blob, tree, commit, and tag — the entire content store |
refs/heads/ |
Branch tips — each file contains one commit SHA |
refs/tags/ |
Tag pointers |
refs/remotes/ |
Remote tracking branches (after first fetch) |
index |
The staging area — a binary file mapping paths to blob SHAs |
config |
Repository-level configuration |
hooks/ |
Client-side hook scripts (pre-commit, pre-push, etc.) |
info/exclude |
Personal ignore rules (like .gitignore but not committed) |
description |
Used by GitWeb — you can ignore this |
packed-refs |
Compact storage for refs after git gc |
logs/ |
Reflog entries — the safety net |
Mental Model: A Git repository is a key-value store (
objects/) plus a set of named pointers (refs/). Everything else — branches, tags, HEAD, the staging area — is just metadata that tells Git where to start reading the object graph. When someone says "I deleted my branch," they deleted a 41-byte pointer file. The commit objects are still there.
Part 3: Building a Commit by Hand¶
This is the core exercise. You'll use Git's plumbing commands — the low-level tools that
the friendly porcelain commands (git add, git commit) call internally.
Step 1: Create a blob¶
# Write content into the object store
echo "# My Project" | git hash-object -w --stdin
# → 7b6d3dfc10d0acfb6ce6e5b07e4a40e2e1c8a0c3
| Flag | What it does |
|---|---|
-w |
Actually write the object (without it, just prints the hash) |
--stdin |
Read from stdin instead of a file |
Verify the object exists:
# The hash is split: first 2 chars = directory, rest = filename
ls .git/objects/7b/
# → 6d3dfc10d0acfb6ce6e5b07e4a40e2e1c8a0c3
# Read it back
git cat-file -p 7b6d3dfc10d0acfb6ce6e5b07e4a40e2e1c8a0c3
# → # My Project
# Check the type
git cat-file -t 7b6d3dfc10d0acfb6ce6e5b07e4a40e2e1c8a0c3
# → blob
Under the Hood: The object file is zlib-compressed. Git doesn't store raw bytes on disk — it compresses them first. That's why
cat .git/objects/7b/6d3d...gives you garbage. You must usegit cat-fileto read objects, because it handles the decompression and header parsing.
Step 2: Build a tree¶
A tree maps filenames to blob hashes. You can't just write a tree from scratch easily — you need to populate the index first, then write it.
# Create a file and add it to the index
echo "# My Project" > README.md
git update-index --add --cacheinfo 100644 \
$(git hash-object -w README.md) README.md
# Create another file
echo "print('hello')" > main.py
git update-index --add --cacheinfo 100644 \
$(git hash-object -w main.py) main.py
# Write the index as a tree object
TREE=$(git write-tree)
echo "Tree hash: $TREE"
Now inspect the tree:
| File mode | Meaning |
|---|---|
100644 |
Regular file |
100755 |
Executable file |
040000 |
Subdirectory (tree) |
120000 |
Symbolic link |
160000 |
Submodule (gitlink) |
Step 3: Create a commit¶
COMMIT=$(echo "Initial commit — built by hand" | \
git commit-tree $TREE)
echo "Commit hash: $COMMIT"
Inspect it:
git cat-file -p $COMMIT
# → tree <tree-hash>
# → author Your Name <you@example.com> 1711108800 +0000
# → committer Your Name <you@example.com> 1711108800 +0000
# →
# → Initial commit — built by hand
No parent — this is a root commit. To make it visible to git log:
# Point main at our hand-built commit
git update-ref refs/heads/main $COMMIT
# Now git log works
git log --oneline
# → abc1234 Initial commit — built by hand
You just did what git add + git commit does — in four explicit steps:
1. git hash-object -w → created blobs
2. git update-index → populated the staging area
3. git write-tree → turned the staging area into a tree object
4. git commit-tree → wrapped the tree in a commit
Interview Bridge: "Walk me through what happens when you run
git commit" is a classic interview question. The answer is exactly what you just did: stage → write-tree → commit-tree → update branch ref. Knowing the plumbing commands proves you understand the abstraction, not just the CLI.
Flashcard Check — Part 1¶
| Question | Answer |
|---|---|
| What are the four Git object types? | Blob, tree, commit, tag |
| What does a blob store? | Raw file contents — no filename, no permissions |
| Where does a branch pointer live on disk? | .git/refs/heads/<branch-name> (a text file with one SHA) |
What does git write-tree do? |
Writes the current index (staging area) as a tree object |
| Why is identical file content stored only once? | Content-addressable storage: the hash IS the address, so identical bytes produce the same hash |
| What is the hash input format for a blob? | blob <byte-length>\0<content> — hashed with SHA-1 |
Part 4: The DAG — How Commits Connect¶
Git's commit history is a directed acyclic graph (DAG). Each commit points to its parent(s). The graph can branch and merge but never cycle — you can't be your own ancestor.
A simple linear history:
A ← B ← C ← D (main)
A branch and merge:
A ← B ← C ← F (main)
↑ ↑
D ← E (feature)
An octopus merge (3+ parents):
A ← B ← G (main)
↑ ↑
C ←──┘ |
D ←──────┘
Every arrow means "this commit's parent field points to that commit." A merge commit has
two (or more) parents. A root commit has zero.
Mental Model: Think of the DAG as a river system. Water flows downstream (parent → child). Branches are tributaries. Merges are confluences. Tags are signs posted on the riverbank.
git logwalks upstream from wherever you're standing.git reflogis a security camera that recorded everywhere you've stood.
Why this matters: when you "lose" a commit, it still exists in the DAG. You just don't have a ref (branch, tag, HEAD) pointing to it. The reflog remembers where every ref used to point, which is how you find your way back.
Part 5: How Merge Actually Works¶
When you run git merge feature, Git doesn't "combine two branches." It follows a specific
algorithm.
Fast-forward merge¶
If main hasn't moved since feature branched off, Git just moves the pointer:
No new commit. The branch pointer slides forward. This is why fast-forward merges produce linear history — nothing actually merged.
Three-way merge¶
If both branches have new commits, Git needs to merge. The algorithm:
- Find the merge base — the most recent common ancestor of both branch tips.
Git uses
git merge-base main featureto find this. - Diff each branch against the base — what did
mainchange? What didfeaturechange? - Combine the diffs — if both sides changed the same lines differently, that's a conflict. Otherwise, apply both sets of changes cleanly.
- Create a merge commit with two parents.
# Find the merge base yourself
git merge-base main feature
# → abc123 (the common ancestor)
# See what each branch changed relative to the base
git diff abc123..main --stat
git diff abc123..feature --stat
Under the Hood: The default merge strategy in modern Git is
ort(Ostensibly Recursive's Twin), which replaced therecursivestrategy in Git 2.34. For merges with a single common ancestor,ortandrecursiveproduce identical results. The difference shows up with criss-cross merges — situations where there are multiple possible merge bases.orthandles these faster and more correctly. Theoctopusstrategy handles merges of 3+ branches and is what the Linux kernel uses for subsystem pull requests.
# See which strategy Git would use
git merge --no-commit --no-ff feature
# The reflog entry will show the strategy used
# Force a specific strategy
git merge -s ort feature
git merge -s recursive feature # legacy, still works
Gotcha: A merge commit doesn't mean there was a conflict. It means the histories diverged and Git created a commit with two parents to join them. A clean merge (no conflicts) still creates a merge commit if you use
--no-ff. Many teams require--no-ffmerges so the branch topology is always visible ingit log --graph.
Part 6: Rebase — The Cherry-Pick Chain¶
Rebase sounds exotic, but under the hood it's simple: cherry-pick each commit, one at a time, onto a new base.
Before rebase:
A ← B ← E ← F (main)
← C ← D (feature)
After: git checkout feature && git rebase main
A ← B ← E ← F (main)
← C' ← D' (feature)
C' and D' are new commits. They have the same diffs as C and D, but different
SHAs because their parent changed. Here's what Git actually does:
- Find the merge base (
B) - Collect the commits to replay (
C,D) - Reset
featuretomain's tip (F) - Cherry-pick
ContoF→ createsC' - Cherry-pick
DontoC'→ createsD' - Move the
featureref toD'
The original C and D still exist — they're just unreachable (no ref points to them).
The reflog remembers them for 30 days (unreachable) to 90 days (reachable).
Remember: Rebase rewrites history. Every replayed commit gets a new SHA. If someone else has based work on the original commits, their history diverges from yours. This is why the golden rule exists: never rebase commits that have been pushed to a shared branch. On your own local branch? Rebase all you want.
Interactive rebase — the editor view¶
pick a1b2c3d feat: add user model
pick d4e5f6a wip: debugging auth
pick 7a8b9c0 fix: auth actually works now
pick 1d2e3f4 feat: add user endpoints
The commands (pick, squash, fixup, reword, drop, edit) each translate to a
cherry-pick variant. squash cherry-picks and amends into the previous commit. drop
skips the cherry-pick entirely. It's all cherry-picks under the hood.
Flashcard Check — Part 2¶
| Question | Answer |
|---|---|
| What is the merge base? | The most recent common ancestor of two branch tips |
What does git merge-base A B return? |
The SHA of the common ancestor commit |
| What merge strategy does modern Git default to? | ort (replaced recursive in Git 2.34) |
| What does rebase actually do under the hood? | Cherry-picks each commit onto the new base, creating new commits with new SHAs |
| Why does rebase change commit hashes? | Because the parent changed — the hash includes the parent SHA |
| When is a fast-forward merge possible? | When the target branch has no new commits since the source branched off |
Part 7: Packfiles and Deltification¶
Fresh objects are stored as individual "loose" files in .git/objects/. This works, but
it's wasteful — you might have 100 versions of a file that differ by one line each, and
each version is stored as a complete compressed blob.
Git solves this with packfiles. When you run git gc, push, or clone, Git packs loose
objects into a .pack file with an accompanying .idx index.
# See your current pack situation
git count-objects -v
# → count: 42 (loose objects)
# → packs: 1 (packfiles)
# → size-pack: 1234 (packfile size in KB)
# Force a repack
git gc
ls .git/objects/pack/
# → pack-abc123.idx
# → pack-abc123.pack
Inside a packfile, Git uses deltification — storing similar objects as a base object plus a binary delta. The algorithm:
- Sort objects by type, then by filename, then by size
- For each object, try to express it as a delta of a nearby object
- Keep the smallest representation (full object or delta)
The result: a 4 GB Linux kernel repository with 1.2 million commits compresses into a packfile small enough to clone over a residential internet connection.
# Peek inside a packfile
git verify-pack -v .git/objects/pack/pack-*.idx | head -20
# Shows each object, its type, size, and whether it's a delta
# Output columns: SHA type size size-in-pack offset [base-SHA delta-depth]
# Objects with a base-SHA are stored as deltas
Trivia: Deltification works across file history AND across different files. If
README.mdandCONTRIBUTING.mdshare most of their content, Git might store one as a delta of the other — even though they're different files. The packing algorithm doesn't care about filenames; it cares about byte similarity.Under the Hood: When you
git clone, the server runsgit pack-objectsto create a custom packfile containing exactly the objects you need. This is the "Receiving objects" and "Resolving deltas" progress you see. The server-side pack is generated on the fly — it's not a pre-built file sitting on disk.
Part 8: The Reflog as Safety Net¶
The reflog is a local, per-ref log of every time a ref (HEAD, branch tip) changes position.
It's stored in .git/logs/.
git reflog
# → abc1234 HEAD@{0}: commit: add authentication
# → def5678 HEAD@{1}: rebase (finish): returning to refs/heads/main
# → 789abcd HEAD@{2}: checkout: moving from feature to main
# → fedcba9 HEAD@{3}: commit: wip — still debugging
Every entry records: the new SHA, the old SHA, the operation, and a timestamp. This is how you recover from almost anything:
# Undo a bad rebase
git reset --hard HEAD@{3}
# Recover a deleted branch
git reflog | grep "feature/auth"
git checkout -b feature/auth abc1234
# See reflog for a specific branch
git reflog show main
Gotcha: The reflog is local only. It is never pushed or shared. If you lose commits on your machine, the reflog on your machine is the only one that can help. If you lose commits on a remote, you need to find someone whose local clone (or reflog) still has them.
Expiry: Reflog entries for reachable commits (still pointed to by a branch or tag)
expire after 90 days. Entries for unreachable commits expire after 30 days. After expiry,
git gc can prune the objects they reference.
# Check expiry settings
git config gc.reflogExpire # default: 90 days
git config gc.reflogExpireUnreachable # default: 30 days
# Never expire (for a critical repo you want maximum safety)
git config gc.reflogExpire never
git config gc.reflogExpireUnreachable never
Part 9: Garbage Collection and git fsck¶
Git accumulates cruft — unreachable objects from rebases, amended commits, abandoned branches. Garbage collection cleans this up.
# Run GC (safe — respects reflog expiry)
git gc
# See what would be cleaned up
git fsck --unreachable
# → unreachable blob abc123...
# → unreachable commit def456...
# Aggressive GC — repacks everything, slower but smaller
git gc --aggressive
# DANGER: prune everything unreachable RIGHT NOW (ignores reflog)
git gc --prune=now
# Only use this if you're SURE you don't need to recover anything
git fsck (file system check) validates the integrity of every object. It can find:
- Dangling objects — blobs/commits not reachable from any ref
- Missing objects — referenced but not in the object store (corruption)
- Broken links — a commit points to a tree that doesn't exist
# Full integrity check
git fsck --full
# Find dangling commits (useful after a bad rebase)
git fsck --lost-found
ls .git/lost-found/commit/
# These are commits that no branch points to — potentially recoverable work
War Story: In 2020, a kernel developer accidentally pushed a commit that broke the build across multiple architectures. Using
git bisect, the team identified the exact breaking commit within hours, reverted it, and the kernel's integrity was maintained.git fsckwas used to verify repository integrity after the recovery. The process worked exactly as Linus had designed it fifteen years earlier. — verified via the linux-kernel mailing list archives.
Part 10: Worktrees — Parallel Working Directories¶
Sometimes you need to work on two branches simultaneously — a hotfix and a feature, or reviewing a PR while your own branch is mid-rebase. Stashing works, but worktrees are cleaner.
A worktree is a second (or third, or fourth) working directory that shares the same .git
object store. Same repo, different checkout.
# Create a worktree for a hotfix
git worktree add ../my-project-hotfix hotfix/urgent-fix
# List active worktrees
git worktree list
# → /home/dev/my-project abc1234 [main]
# → /home/dev/my-project-hotfix def5678 [hotfix/urgent-fix]
# Work in the hotfix worktree
cd ../my-project-hotfix
# edit, commit, push — then come back
# Clean up when done
git worktree remove ../my-project-hotfix
Why this matters: worktrees share the object store. A commit made in one worktree is immediately visible (as an object) in the other. No duplicate downloads, no separate clones.
Gotcha: You can't check out the same branch in two worktrees simultaneously. Git prevents this because two working directories modifying the same branch ref would corrupt the state. If you need to test the same branch in two places, create a temporary branch in one worktree.
Flashcard Check — Part 3¶
| Question | Answer |
|---|---|
| What triggers Git to create packfiles? | git gc, git push, git clone, or git repack |
| What is deltification? | Storing similar objects as a base object plus a binary diff (delta) |
| How long do reflog entries last for unreachable commits? | 30 days by default (gc.reflogExpireUnreachable) |
What does git fsck --lost-found do? |
Finds dangling commits/blobs and copies them to .git/lost-found/ |
What does git gc --prune=now do that git gc doesn't? |
Immediately prunes all unreachable objects, ignoring the default 2-week grace period |
| Can two worktrees check out the same branch? | No — Git prevents this to avoid ref corruption |
Part 11: The Index (Staging Area) — Deeper Than You Think¶
The index (.git/index) is a binary file that sits between your working directory and the
object store. When you run git add, you're writing to the index. When you run git commit,
Git reads the index.
# See what's in the index
git ls-files --stage
# → 100644 abc123def456... 0 README.md
# → 100644 789abcdef012... 0 src/main.py
The columns: file mode, blob SHA, stage number, path.
The stage number is normally 0. During a merge conflict, Git writes three versions:
| Stage | What it is |
|---|---|
| 1 | Common ancestor version |
| 2 | "Ours" (current branch) |
| 3 | "Theirs" (branch being merged) |
# During a conflict, see all three stages
git ls-files --stage --unmerged
# → 100644 abc123... 1 config.yaml (ancestor)
# → 100644 def456... 2 config.yaml (ours)
# → 100644 789abc... 3 config.yaml (theirs)
This is the raw data that git mergetool presents visually. Understanding it means you can
resolve conflicts with plumbing commands when the porcelain tools fail.
Trivia: The staging area was Linus Torvalds's most controversial design decision. Many developers coming from SVN found it confusing and unnecessary — why not just commit everything that changed? But the index enables
git add -p(partial staging), which lets you split a messy working directory into clean, logical commits. Today it's considered one of Git's best features. The controversy took years to settle.
Part 12: Putting It All Together — The Full Commit Lifecycle¶
Here's everything that happens when you type git commit -m "fix auth bug":
1. git commit reads .git/index (the staging area)
2. Creates a tree object from the index → writes to .git/objects/
3. Creates a commit object:
- Points to the tree from step 2
- Points to the current HEAD as parent
- Records author, committer, timestamp, message
- Writes to .git/objects/
4. Updates .git/refs/heads/<branch> to point to the new commit
5. Updates .git/HEAD (if it changed — usually it didn't)
6. Appends an entry to .git/logs/HEAD (the reflog)
7. Runs post-commit hooks (if any)
That's it. Seven steps. No network calls, no server, no locking. This is why Git is fast — a commit is just writing a couple of small files to disk and updating a pointer.
Mental Model: Every Git operation is either (a) writing objects to the store, (b) updating a ref to point to a different object, or (c) both.
git addwrites blobs.git commitwrites a tree and a commit, then updates a ref.git branchcreates a ref.git tagcreates a tag object and a ref.git mergewrites a commit with two parents and updates a ref. Once you see this pattern, Git stops being a collection of 150 commands and becomes variations on two operations.
Exercises¶
Exercise 1: Build a commit from scratch (5 minutes)¶
In an empty repository, create a commit using only plumbing commands. No git add, no
git commit.
Hint
The sequence: `git hash-object -w` → `git update-index` → `git write-tree` → `git commit-tree` → `git update-ref`.Solution
mkdir /tmp/plumbing-lab && cd /tmp/plumbing-lab && git init
echo "Hello from plumbing" > hello.txt
BLOB=$(git hash-object -w hello.txt)
git update-index --add --cacheinfo 100644 $BLOB hello.txt
TREE=$(git write-tree)
COMMIT=$(echo "Hand-built commit" | git commit-tree $TREE)
git update-ref refs/heads/main $COMMIT
git log --oneline
Exercise 2: Trace a merge base (10 minutes)¶
Create a repo with two divergent branches. Find the merge base manually using
git merge-base. Then verify by looking at git log --graph --all --oneline.
Hint
Create a few commits on `main`, branch, add commits to both `main` and the branch, then use `git merge-base main feature`.Exercise 3: Recover a "deleted" branch (5 minutes)¶
Create a branch, make two commits on it, delete the branch with git branch -D, then
recover it using only the reflog.
Solution
git checkout -b experiment
echo "experiment 1" > exp.txt && git add exp.txt && git commit -m "exp commit 1"
echo "experiment 2" >> exp.txt && git add exp.txt && git commit -m "exp commit 2"
git checkout main
git branch -D experiment
# "Deleted branch experiment (was abc1234)."
# Recover:
git reflog | grep experiment
# Or just use the SHA from the deletion message:
git checkout -b experiment abc1234
git log --oneline experiment
Exercise 4: Inspect a packfile (10 minutes)¶
Run git gc on a repository with at least 20 commits. Then use git verify-pack -v to
find a deltified object. Use git cat-file -p to read it and confirm Git reconstructs
the full content transparently.
Hint
In `git verify-pack` output, deltified objects have a sixth column (the base object SHA) and a seventh column (delta chain depth). Pick one and `git cat-file -p` it.Exercise 5: The conflict stage numbers (15 minutes)¶
Create a merge conflict intentionally. While the conflict is active (before resolving),
run git ls-files --stage --unmerged and identify the ancestor, ours, and theirs versions.
Use git cat-file -p to read each one.
Cheat Sheet¶
| Task | Command |
|---|---|
| Hash content without storing | echo "text" \| git hash-object --stdin |
| Hash and store | echo "text" \| git hash-object -w --stdin |
| Read any object | git cat-file -p <sha> |
| Object type | git cat-file -t <sha> |
| Object size | git cat-file -s <sha> |
| Write staging area as tree | git write-tree |
| Create commit from tree | echo "msg" \| git commit-tree <tree> -p <parent> |
| Update a branch ref | git update-ref refs/heads/<branch> <sha> |
| Show staging area contents | git ls-files --stage |
| Find merge base | git merge-base <branch1> <branch2> |
| Show all reflog entries | git reflog |
| Reflog for specific branch | git reflog show <branch> |
| Find lost objects | git fsck --lost-found |
| Repository integrity check | git fsck --full |
| Pack statistics | git count-objects -v |
| Inspect packfile contents | git verify-pack -v .git/objects/pack/pack-*.idx |
| Create worktree | git worktree add <path> <branch> |
| Force repack | git repack -a -d |
Takeaways¶
-
Git is four object types and a pile of pointers. Blobs, trees, commits, tags — stored by their content hash. Branches, HEAD, and tags are just named pointers into this graph. Everything else is variations on "write objects, move pointers."
-
Content-addressable storage means deduplication is free. Identical files across a thousand commits are stored once. This is why Git repos are smaller than you'd expect.
-
The reflog is your 30–90 day safety net. Almost nothing is truly lost until garbage collection runs. Make
git reflogyour first command after any disaster. -
Packfiles are why Git scales. Deltification compresses similar objects (even across different files) into compact binary diffs. A million-commit repo fits on your laptop.
-
Merge finds a common ancestor, then combines two diffs. The three-way merge algorithm is the same whether Git uses
recursive,ort, oroctopus. Understanding the merge base is understanding the merge. -
Rebase is just cherry-pick in a loop. Every replayed commit gets a new SHA because its parent changed. The originals survive in the reflog. Never rebase shared history.
Related Lessons¶
- The Git Disaster Recovery Guide — reflog rescues, force push recovery, and bisect for bug hunting
- What Happens When You git push to CI — follows a push through pack negotiation, SSH transport, and CI triggers
- GitOps — The Repo Is the Truth — using Git as the single source of truth for infrastructure state