Skip to content

Symptoms

  • The database server db-primary-01 intermittently becomes unresponsive for 10-30 seconds.
  • During these pauses, SSH sessions freeze and database queries time out.
  • dmesg shows BUG: soft lockup - CPU#2 stuck for 22s! messages.
  • The soft lockup messages appear 2-3 times per day, correlated with heavy write workloads.
  • The server has 8 CPUs and 64 GB of RAM, running a PostgreSQL database.
  • The kernel version is 5.4.0 (not the latest patch level for this series).
  • Monitoring shows CPU#2 spends extended periods in kernel mode during the lockup events.