linux
l2
deep-dive
filesystem
linux-fundamentals --- Portal | Level: L2: Operations | Topics: Filesystems & Storage, Linux Fundamentals | Domain: Linux

Linux Filesystem Internals¶

Scope¶

This document explains the filesystem stack as it matters to Linux administrators and DevOps engineers:

VFS
dentries and inodes
pathname lookup
page cache
journaling
ext4/XFS/Btrfs mental models
writeback and fsync
mounts and namespace effects
common performance and integrity issues

Reference anchors: - https://docs.kernel.org/filesystems/index.html - https://docs.kernel.org/filesystems/path-lookup.html

Big Picture¶

Applications think they are doing this:

open -> read/write -> close

Linux is actually doing something more like this:

syscall
  -> VFS
  -> pathname lookup
  -> dentry/inode resolution
  -> permissions and mount checks
  -> page cache interaction
  -> filesystem-specific code
  -> block layer / storage device

The filesystem stack is where names become objects and where object operations become storage operations.

VFS: The Abstraction Layer¶

The Virtual Filesystem Switch (VFS) is the generic layer that provides a common interface across many filesystems.

That is why userland can use the same syscalls on: - ext4 - XFS - tmpfs - NFS - overlayfs - procfs - many others

The VFS defines common object models and operation hooks.

This is the abstraction that keeps Linux from needing per-filesystem syscalls.

Core Objects¶

Inode¶

Represents a filesystem object's metadata and identity: - mode - ownership - timestamps - size - block mapping metadata - operation vectors

An inode is not the filename.

Dentry¶

Represents a directory entry / name-to-object association and pathname lookup cache state.

This is a huge conceptual point: names and objects are related but not identical.

File object¶

Represents an open instance with per-open state: - file offset - open flags - credentials snapshot aspects - operation hooks

Multiple file descriptors can refer to the same underlying inode via distinct file objects/open contexts.

Pathname Lookup¶

When you open /var/log/app.log, Linux does not magically teleport there. It walks the path component by component.

Conceptually: 1. start from root or cwd 2. lookup var 3. lookup log 4. lookup app.log 5. validate permissions and mount transitions 6. resolve symlinks according to rules/flags 7. reach target dentry/inode

The dcache exists because doing that work cold every time would be expensive.

Dentry Cache (dcache)¶

The dcache stores pathname lookup results and related metadata.

It speeds up: - repeated opens - path walks - metadata-heavy workloads

Negative dentries also matter: they cache failed lookups, which helps repeated "file does not exist" cases.

This is one reason filesystem performance is not just about disks; metadata caching matters a lot.

Page Cache and I/O¶

File reads and writes often interact with the page cache first.

Read path¶

If data is already cached: - return from memory

If not: - page fault or read path fetches from storage into cache - user gets data

Write path¶

Often writes first dirty the page cache. Writeback later flushes to stable storage.

That means: write() success does not automatically mean data is durable on disk.

That is what fsync() and related durability semantics are about.

Journaling¶

Journaling filesystems track metadata updates in a log/journal to improve crash consistency.

Important subtlety: journaling usually protects metadata first, not necessarily all user data in the naive sense.

This is why durability questions require care.

Know the difference between: - write acknowledged to page cache - metadata journaled - data flushed - barriers/cache flushes honored - truly durable after power loss

ext4, XFS, Btrfs - Mental Models¶

ext4¶

General-purpose, widely used, journaling filesystem. Good default answer for "normal Linux server filesystem."

XFS¶

Strong for large filesystems, parallelism, and big-file workloads. Common in enterprise Linux.

Btrfs¶

Copy-on-write filesystem with snapshots and checksumming. Very featureful, but operational tradeoffs must be understood.

You do not need to be a kernel maintainer. You do need to understand that filesystems make different tradeoffs in: - metadata design - CoW behavior - fragmentation - recovery model - snapshotting - tooling expectations

Mounts and Namespace Effects¶

A mount is not just "the disk exists." It is a namespace attachment.

Important consequences: - the same filesystem can be visible in different namespace arrangements - mount options change behavior - bind mounts and overlayfs alter visibility without changing underlying data

This matters enormously in containers and Kubernetes.

`fsync()` and Durability¶

One of the most misunderstood topics.

write(): - often means data reached kernel buffers/page cache

fsync(): - asks for durability of file data + relevant metadata for that file

fdatasync(): - similar, with somewhat narrower metadata requirements

Real-world lesson: storage layers, controller caches, filesystems, journals, barriers, and mount options all affect what "safe" really means.

Common Performance Pain¶

Small-file metadata storms¶

Path lookup, inode work, journal churn.

Writeback stalls¶

Dirty pages accumulate, then the system pays.

Fragmentation / CoW side effects¶

Particularly relevant in some workloads/filesystems.

Slow storage hidden behind page cache¶

Looks fine until cache misses or flush pressure.

Remote filesystem semantics¶

NFS/clustered/validated paths behave differently from local filesystems.

Useful Commands¶

mount
findmnt
lsblk -f
df -hT
stat /path/to/file

xfs_info /mountpoint
tune2fs -l /dev/...
btrfs filesystem show

iostat -xz 1
vmstat 1

For deep work: - strace - blktrace - fio - perf - eBPF tracing

Interview-Level Things to Explain¶

You should be able to explain:

what VFS does
difference between inode and dentry
how pathname lookup works
why page cache matters
why write() is not the same as durable commit
what journaling buys you
broad tradeoff differences among ext4/XFS/Btrfs

Fast Mental Model¶

The Linux filesystem stack translates human pathnames into object operations through VFS, caches metadata and file data aggressively, and coordinates crash-consistency and durability through filesystem-specific policies layered on top of the block device.

Prerequisites¶

Linux Ops (Topic Pack, L0)

Case Study: Disk Full Root Services Down (Case Study, L1) — Filesystems & Storage, Linux Fundamentals
Case Study: Runaway Logs Fill Disk (Case Study, L1) — Filesystems & Storage, Linux Fundamentals
Deep Dive: Linux Performance Debugging (deep_dive, L2) — Filesystems & Storage, Linux Fundamentals
Disk & Storage Ops (Topic Pack, L1) — Filesystems & Storage, Linux Fundamentals
Kernel Troubleshooting (Topic Pack, L3) — Filesystems & Storage, Linux Fundamentals
/proc Filesystem (Topic Pack, L2) — Linux Fundamentals
Advanced Bash for Ops (Topic Pack, L1) — Linux Fundamentals
Adversarial Interview Gauntlet (30 sequences) (Scenario, L2) — Linux Fundamentals
Bash Exercises (Quest Ladder) (CLI) (Exercise Set, L0) — Linux Fundamentals
Case Study: CI Pipeline Fails — Docker Layer Cache Corruption (Case Study, L2) — Linux Fundamentals

Linux Filesystem Internals¶

Scope¶

Big Picture¶

VFS: The Abstraction Layer¶

Core Objects¶

Inode¶

Dentry¶

File object¶

Pathname Lookup¶

Dentry Cache (dcache)¶

Page Cache and I/O¶

Read path¶

Write path¶

Journaling¶

ext4, XFS, Btrfs - Mental Models¶

ext4¶

XFS¶

Btrfs¶

Mounts and Namespace Effects¶

`fsync()` and Durability¶

Common Performance Pain¶

Small-file metadata storms¶

Writeback stalls¶

Fragmentation / CoW side effects¶

Slow storage hidden behind page cache¶

Remote filesystem semantics¶

Useful Commands¶

Interview-Level Things to Explain¶

Fast Mental Model¶

Wiki Navigation¶

Prerequisites¶

Pages that link here¶

Linux Filesystem Internals¶

Scope¶

Big Picture¶

VFS: The Abstraction Layer¶

Core Objects¶

Inode¶

Dentry¶

File object¶

Pathname Lookup¶

Dentry Cache (dcache)¶

Page Cache and I/O¶

Read path¶

Write path¶

Journaling¶

ext4, XFS, Btrfs - Mental Models¶

ext4¶

XFS¶

Btrfs¶

Mounts and Namespace Effects¶

fsync() and Durability¶

Common Performance Pain¶

Small-file metadata storms¶

Writeback stalls¶

Fragmentation / CoW side effects¶

Slow storage hidden behind page cache¶

Remote filesystem semantics¶

Useful Commands¶

Interview-Level Things to Explain¶

Fast Mental Model¶

Wiki Navigation¶

Prerequisites¶

Related Content¶

Pages that link here¶

`fsync()` and Durability¶