Skip to content

Solution

Triage

  1. Check the mount options:
    mount | grep nfs
    cat /proc/mounts | grep shared-data
    
  2. Check for stuck processes:
    ps aux | awk '$8 ~ /D/ {print}'
    
  3. Test NFS server connectivity:
    ping -c 3 nfs.internal
    rpcinfo -p nfs.internal
    showmount -e nfs.internal
    
  4. Check NFS client state:
    nfsstat -c
    cat /proc/net/rpc/nfs
    

Root Cause

The NFS mount at /mnt/shared-data uses the default hard mount option. With a hard mount, when the NFS server becomes unreachable, the NFS client retries the request indefinitely. Any process that performs a filesystem operation on the mount point enters uninterruptible sleep (D state) while the kernel retries the NFS RPC.

The NFS server nfs.internal is unreachable because of a network switch failure. The hard mount ensures no data corruption (requests will eventually succeed when the server returns), but it blocks all processes that touch the mount path.

Fix

Immediate (unblock the system):

  1. Try a force unmount:

    umount -f /mnt/shared-data
    
    This may fail if processes have open file handles.

  2. If force unmount fails, use lazy unmount:

    umount -l /mnt/shared-data
    
    This detaches the mount from the filesystem namespace. New processes will no longer see or hang on the mount point. Already-stuck processes remain in D state until the server returns or they are killed by a reboot.

  3. Kill stuck processes if possible (note: D-state processes cannot be killed with SIGKILL; they must be cleared by the kernel):

    # Identify stuck PIDs
    ps aux | awk '$8 ~ /D/' | grep shared-data
    kill -9 <PID>  # This will not work for D-state processes
    

When the NFS server is restored:

  1. Remount with safer options:

    mount -t nfs -o soft,timeo=30,retrans=3,nosuid nfs.internal:/exports/shared-data /mnt/shared-data
    

  2. Update /etc/fstab:

    nfs.internal:/exports/shared-data  /mnt/shared-data  nfs  soft,timeo=30,retrans=3,nosuid,_netdev  0  0
    

Rollback / Safety

  • Lazy unmount makes the mount invisible but does not free kernel resources until all open files are closed.
  • soft mounts will return EIO errors to applications when the server is unreachable. Applications must handle these errors.
  • If data integrity is critical, keep hard but add timeo and intr (or use softreval on newer kernels).

Common Traps

  • Trying to kill -9 a D-state process. Processes in uninterruptible sleep cannot receive signals. Only the kernel can wake them when the I/O completes or the mount is unmounted.
  • Using df -h to diagnose. The df command itself will hang because it stats all filesystems, including the stuck NFS mount. Use df -h --exclude-type=nfs or df -h -x nfs4.
  • Assuming force unmount always works. umount -f often fails when processes have open files. umount -l is the pragmatic solution.
  • Not using _netdev in fstab. Without it, the system may hang on boot if the NFS server is unavailable during mount.
  • Forgetting that intr is deprecated. On Linux kernels 2.6.25+, intr is a no-op. Use soft mount instead for interruptible behavior.