- linux
- l2
- deep-dive
- filesystem
- linux-fundamentals --- Portal | Level: L2: Operations | Topics: Filesystems & Storage, Linux Fundamentals | Domain: Linux
Linux Filesystem Internals¶
Scope¶
This document explains the filesystem stack as it matters to Linux administrators and DevOps engineers:
- VFS
- dentries and inodes
- pathname lookup
- page cache
- journaling
- ext4/XFS/Btrfs mental models
- writeback and fsync
- mounts and namespace effects
- common performance and integrity issues
Reference anchors: - https://docs.kernel.org/filesystems/index.html - https://docs.kernel.org/filesystems/path-lookup.html
Big Picture¶
Applications think they are doing this:
Linux is actually doing something more like this:
syscall
-> VFS
-> pathname lookup
-> dentry/inode resolution
-> permissions and mount checks
-> page cache interaction
-> filesystem-specific code
-> block layer / storage device
The filesystem stack is where names become objects and where object operations become storage operations.
VFS: The Abstraction Layer¶
The Virtual Filesystem Switch (VFS) is the generic layer that provides a common interface across many filesystems.
That is why userland can use the same syscalls on: - ext4 - XFS - tmpfs - NFS - overlayfs - procfs - many others
The VFS defines common object models and operation hooks.
This is the abstraction that keeps Linux from needing per-filesystem syscalls.
Core Objects¶
Inode¶
Represents a filesystem object's metadata and identity: - mode - ownership - timestamps - size - block mapping metadata - operation vectors
An inode is not the filename.
Dentry¶
Represents a directory entry / name-to-object association and pathname lookup cache state.
This is a huge conceptual point: names and objects are related but not identical.
File object¶
Represents an open instance with per-open state: - file offset - open flags - credentials snapshot aspects - operation hooks
Multiple file descriptors can refer to the same underlying inode via distinct file objects/open contexts.
Pathname Lookup¶
When you open /var/log/app.log, Linux does not magically teleport there.
It walks the path component by component.
Conceptually:
1. start from root or cwd
2. lookup var
3. lookup log
4. lookup app.log
5. validate permissions and mount transitions
6. resolve symlinks according to rules/flags
7. reach target dentry/inode
The dcache exists because doing that work cold every time would be expensive.
Dentry Cache (dcache)¶
The dcache stores pathname lookup results and related metadata.
It speeds up: - repeated opens - path walks - metadata-heavy workloads
Negative dentries also matter: they cache failed lookups, which helps repeated "file does not exist" cases.
This is one reason filesystem performance is not just about disks; metadata caching matters a lot.
Page Cache and I/O¶
File reads and writes often interact with the page cache first.
Read path¶
If data is already cached: - return from memory
If not: - page fault or read path fetches from storage into cache - user gets data
Write path¶
Often writes first dirty the page cache. Writeback later flushes to stable storage.
That means:
write() success does not automatically mean data is durable on disk.
That is what fsync() and related durability semantics are about.
Journaling¶
Journaling filesystems track metadata updates in a log/journal to improve crash consistency.
Important subtlety: journaling usually protects metadata first, not necessarily all user data in the naive sense.
This is why durability questions require care.
Know the difference between: - write acknowledged to page cache - metadata journaled - data flushed - barriers/cache flushes honored - truly durable after power loss
ext4, XFS, Btrfs - Mental Models¶
ext4¶
General-purpose, widely used, journaling filesystem. Good default answer for "normal Linux server filesystem."
XFS¶
Strong for large filesystems, parallelism, and big-file workloads. Common in enterprise Linux.
Btrfs¶
Copy-on-write filesystem with snapshots and checksumming. Very featureful, but operational tradeoffs must be understood.
You do not need to be a kernel maintainer. You do need to understand that filesystems make different tradeoffs in: - metadata design - CoW behavior - fragmentation - recovery model - snapshotting - tooling expectations
Mounts and Namespace Effects¶
A mount is not just "the disk exists." It is a namespace attachment.
Important consequences: - the same filesystem can be visible in different namespace arrangements - mount options change behavior - bind mounts and overlayfs alter visibility without changing underlying data
This matters enormously in containers and Kubernetes.
fsync() and Durability¶
One of the most misunderstood topics.
write():
- often means data reached kernel buffers/page cache
fsync():
- asks for durability of file data + relevant metadata for that file
fdatasync():
- similar, with somewhat narrower metadata requirements
Real-world lesson: storage layers, controller caches, filesystems, journals, barriers, and mount options all affect what "safe" really means.
Common Performance Pain¶
Small-file metadata storms¶
Path lookup, inode work, journal churn.
Writeback stalls¶
Dirty pages accumulate, then the system pays.
Fragmentation / CoW side effects¶
Particularly relevant in some workloads/filesystems.
Slow storage hidden behind page cache¶
Looks fine until cache misses or flush pressure.
Remote filesystem semantics¶
NFS/clustered/validated paths behave differently from local filesystems.
Useful Commands¶
For deep work:
- strace
- blktrace
- fio
- perf
- eBPF tracing
Interview-Level Things to Explain¶
You should be able to explain:
- what VFS does
- difference between inode and dentry
- how pathname lookup works
- why page cache matters
- why
write()is not the same as durable commit - what journaling buys you
- broad tradeoff differences among ext4/XFS/Btrfs
Fast Mental Model¶
The Linux filesystem stack translates human pathnames into object operations through VFS, caches metadata and file data aggressively, and coordinates crash-consistency and durability through filesystem-specific policies layered on top of the block device.
Wiki Navigation¶
Prerequisites¶
- Linux Ops (Topic Pack, L0)
Related Content¶
- Case Study: Disk Full Root Services Down (Case Study, L1) — Filesystems & Storage, Linux Fundamentals
- Case Study: Runaway Logs Fill Disk (Case Study, L1) — Filesystems & Storage, Linux Fundamentals
- Deep Dive: Linux Performance Debugging (deep_dive, L2) — Filesystems & Storage, Linux Fundamentals
- Disk & Storage Ops (Topic Pack, L1) — Filesystems & Storage, Linux Fundamentals
- Kernel Troubleshooting (Topic Pack, L3) — Filesystems & Storage, Linux Fundamentals
- /proc Filesystem (Topic Pack, L2) — Linux Fundamentals
- Advanced Bash for Ops (Topic Pack, L1) — Linux Fundamentals
- Adversarial Interview Gauntlet (30 sequences) (Scenario, L2) — Linux Fundamentals
- Bash Exercises (Quest Ladder) (CLI) (Exercise Set, L0) — Linux Fundamentals
- Case Study: CI Pipeline Fails — Docker Layer Cache Corruption (Case Study, L2) — Linux Fundamentals
Pages that link here¶
- /proc Filesystem
- Disk & Storage Ops
- Disk & Storage Ops Primer
- Disk Full Root - Services Down
- Inodes - Primer
- Kernel Troubleshooting
- Kernel Troubleshooting - Primer
- Linux Performance Debugging
- Mounts & Filesystems - Street-Level Ops
- NVMe Drive Disappeared After Reboot
- Primer
- Runbook: Disk Full
- Runbook: PostgreSQL Disk Space Critical
- Storage Operations - Primer
- Symptoms