Skip to content

Solution

Triage

  1. Read the soft lockup messages and stack traces:
    dmesg -T | grep -A 20 "soft lockup"
    
  2. Check THP status:
    cat /sys/kernel/mm/transparent_hugepage/enabled
    cat /sys/kernel/mm/transparent_hugepage/defrag
    
  3. Check dirty page settings:
    sysctl vm.dirty_ratio vm.dirty_background_ratio vm.dirty_expire_centisecs
    
  4. Check I/O and CPU:
    iostat -xz 1 5
    mpstat -P ALL 1 5
    

Root Cause

The dmesg stack trace shows the CPU stuck in compact_zone_order() which is the kernel's memory compaction function. This is triggered by transparent hugepages (THP), which is enabled by default on this system.

When PostgreSQL performs heavy writes, the system generates significant page cache dirty pages. When THP's defrag mode is set to always, the kernel tries to compact memory into 2MB hugepages on every allocation. Under heavy load, this compaction can hold a CPU in kernel mode for tens of seconds, triggering the soft lockup watchdog.

Contributing factors: - THP enabled with defrag=always - High vm.dirty_ratio (40%) causing large dirty page flushes - Kernel 5.4.0 base has known THP compaction stall issues fixed in later patches

Fix

Immediate (stop the lockups):

  1. Disable THP (recommended for all database servers):

    echo never > /sys/kernel/mm/transparent_hugepage/enabled
    echo never > /sys/kernel/mm/transparent_hugepage/defrag
    

  2. Persist across reboots by adding to /etc/rc.local or creating a systemd unit:

    # /etc/systemd/system/disable-thp.service
    [Unit]
    Description=Disable Transparent Huge Pages
    Before=postgresql.service
    
    [Service]
    Type=oneshot
    ExecStart=/bin/sh -c 'echo never > /sys/kernel/mm/transparent_hugepage/enabled && echo never > /sys/kernel/mm/transparent_hugepage/defrag'
    
    [Install]
    WantedBy=multi-user.target
    

  3. Tune dirty page settings to reduce flush storms:

    sysctl -w vm.dirty_ratio=10
    sysctl -w vm.dirty_background_ratio=5
    sysctl -w vm.dirty_expire_centisecs=500
    
    Persist in /etc/sysctl.d/99-database.conf.

Medium-term:

  1. Update the kernel to the latest 5.4.x LTS patch:
    apt-get update && apt-get install linux-image-5.4.0-latest-generic
    

Rollback / Safety

  • Disabling THP is safe and recommended by PostgreSQL, MongoDB, Redis, and most database vendors.
  • Dirty page tuning may increase I/O frequency but in smaller, less disruptive batches.
  • Kernel updates require a reboot. Schedule during a maintenance window with database failover.

Common Traps

  • Increasing kernel.watchdog_thresh to suppress the warning. This hides the symptom without fixing the cause. The CPU is still stuck.
  • Assuming soft lockups are hardware failures. While they can indicate hardware issues, software causes (THP, I/O stalls, spinlock contention) are far more common.
  • Not disabling THP on database servers. This is one of the most common performance and stability issues. Every major database vendor documents this recommendation.
  • Confusing soft lockup with hard lockup. Soft lockup = CPU stuck in kernel code. Hard lockup = CPU not responding to interrupts (usually hardware or NMI issue).
  • Not reading the stack trace. The stack trace tells you exactly where the CPU is stuck. compact_zone = THP. blk_mq_ = I/O. rwsem_ = lock contention.