Skip to content

Questions to Determine

  • What kernel function is shown in the stack trace of the soft lockup?
  • Is the lockup related to I/O, memory management, or a specific kernel driver?
  • Is the kernel version known to have bugs that cause soft lockups?
  • What does the stack trace in dmesg reveal about where the CPU is stuck?
  • Are the lockups correlated with specific workloads (writes, fsync, memory allocation)?
  • Is the storage subsystem (RAID controller, NVMe, SAN) experiencing issues?
  • Are any kernel parameters (vm.dirty_ratio, vm.dirty_background_ratio) misconfigured?
  • Is transparent hugepages (THP) enabled and could compaction be causing stalls?