Virtualization - Street-Level Ops¶
Quick Diagnosis Commands¶
# ── Host Capabilities ──
virt-host-validate # Pre-flight check: KVM, IOMMU, etc.
grep -cE '(vmx|svm)' /proc/cpuinfo # Count CPU cores with virt extensions
lsmod | grep kvm # Verify KVM module loaded
cat /proc/sys/net/bridge/bridge-nf-call-iptables # Bridge filtering (should be 1)
# ── VM Status ──
virsh list --all # All VMs, all states
virsh dominfo web01 # CPU, memory, state, autostart
virsh domstats web01 # Detailed runtime stats
virsh vcpuinfo web01 # vCPU to pCPU mapping and time
# ── Resource Usage ──
virt-top # top-like view for VMs
virsh dommemstat web01 # Memory stats (balloon, RSS, etc.)
virsh domblkstat web01 vda # Block device I/O stats
virsh domifstat web01 vnet0 # Network I/O stats
# ── Storage ──
virsh vol-list default # List volumes in default pool
qemu-img info /var/lib/libvirt/images/web01.qcow2 # Image details + actual size
virsh domblklist web01 # Block devices attached to VM
du -sh /var/lib/libvirt/images/* # Actual disk usage (thin provisioning!)
# ── Networking ──
virsh net-list --all # List virtual networks
virsh domiflist web01 # VM's network interfaces
brctl show # Bridge details (or: bridge link show)
virsh domifaddr web01 # VM's IP addresses (requires guest agent)
Gotcha: VM Won't Start — "Cannot Access Storage File"¶
You moved or renamed a disk image and now virsh start fails.
# Check what path the VM expects
virsh domblklist web01
# Target Source
# vda /var/lib/libvirt/images/web01.qcow2
# If the file is elsewhere, edit the XML
virsh edit web01
# Change the <source file="/old/path"/> to the correct path
# Or detach and reattach
virsh detach-disk web01 vda --config
virsh attach-disk web01 /new/path/web01.qcow2 vda --config --subdriver qcow2
Gotcha: Console Access Shows Nothing¶
virsh console web01 connects but shows a blank screen.
# The guest needs a serial console configured. Add to guest's kernel cmdline:
# console=ttyS0,115200
# For RHEL/CentOS (grubby):
# Inside the guest:
grubby --update-kernel=ALL --args="console=ttyS0,115200"
# Ensure the VM XML has a serial device:
virsh edit web01
# Look for:
# <serial type='pty'>
# <target port='0'/>
# </serial>
# <console type='pty'>
# <target type='serial' port='0'/>
# </console>
# Escape from virsh console: Ctrl + ]
Pattern: Automated VM Provisioning with virt-install¶
# Minimal CentOS Stream 9 with kickstart
virt-install \
--name db01 \
--ram 8192 \
--vcpus 4 \
--cpu host-passthrough \
--disk path=/var/lib/libvirt/images/db01.qcow2,size=100,format=qcow2,bus=virtio \
--os-variant centos-stream9 \
--network bridge=br0,model=virtio \
--location http://mirror.example.com/centos-stream/9/BaseOS/x86_64/os/ \
--initrd-inject=/root/ks.cfg \
--extra-args "inst.ks=file:/ks.cfg console=ttyS0,115200" \
--graphics none \
--noautoconsole
# Cloud image with cloud-init (faster)
# Download cloud image first, then:
cp /var/lib/libvirt/boot/CentOS-Stream-GenericCloud-9.qcow2 /var/lib/libvirt/images/app01.qcow2
qemu-img resize /var/lib/libvirt/images/app01.qcow2 40G
# Create cloud-init ISO
cat > /tmp/meta-data <<METAEOF
instance-id: app01
local-hostname: app01
METAEOF
cat > /tmp/user-data <<USEREOF
#cloud-config
users:
- name: ops
ssh_authorized_keys:
- ssh-rsa AAAA...
sudo: ALL=(ALL) NOPASSWD:ALL
USEREOF
genisoimage -output /var/lib/libvirt/images/app01-cidata.iso \
-volid cidata -joliet -rock /tmp/user-data /tmp/meta-data
virt-install \
--name app01 \
--ram 2048 --vcpus 2 \
--disk path=/var/lib/libvirt/images/app01.qcow2 \
--disk path=/var/lib/libvirt/images/app01-cidata.iso,device=cdrom \
--os-variant centos-stream9 \
--network bridge=br0,model=virtio \
--import --noautoconsole
Pattern: Snapshot Workflow for Upgrades¶
Gotcha: Every snapshot adds a layer to the qcow2 backing chain. Each layer adds I/O overhead because reads must traverse the chain. Three or more active snapshots can degrade disk performance by 30-50%. Always delete snapshots after successful upgrades -- they are safety nets, not long-term backups.
# Before upgrade
virsh snapshot-create-as web01 \
--name "pre-kernel-upgrade-$(date +%Y%m%d)" \
--description "Before kernel 5.14.x upgrade" \
--atomic
# Verify snapshot exists
virsh snapshot-list web01
# Do the upgrade... if it goes wrong:
virsh snapshot-revert web01 --snapshotname "pre-kernel-upgrade-20260315"
# If upgrade succeeds, CLEAN UP the snapshot (this is critical)
virsh snapshot-delete web01 --snapshotname "pre-kernel-upgrade-20260315"
# Check snapshot chain depth (performance degrades with depth)
qemu-img info --backing-chain /var/lib/libvirt/images/web01.qcow2
Gotcha: Thin-Provisioned Disk Shows 40GB but Host Has 10GB Free¶
Default trap: qcow2 images only grow, never shrink -- even if the guest deletes files. The guest frees blocks, but the host file keeps its size. To reclaim space: enable
discard='unmap'on the disk driver, then runfstrim -ainside the guest to punch holes in the image.
qcow2 thin provisioning means the image file grows as data is written. A 200GB virtual disk might only use 5GB initially, but it can silently fill the host filesystem.
# Check virtual size vs actual size
qemu-img info /var/lib/libvirt/images/web01.qcow2
# virtual size: 200 GiB
# disk size: 4.7 GiB
# Monitor actual usage
du -sh /var/lib/libvirt/images/*.qcow2
# Set up an alert when the storage pool gets low
# Check pool capacity:
virsh pool-info default
Pattern: Performance Tuning Checklist¶
# 1. Use VirtIO for all devices
virsh dumpxml web01 | grep -E "(bus='|model=')"
# Should see: bus='virtio', model='virtio'
# 2. CPU mode: host-passthrough (exposes real CPU features to guest)
virsh dumpxml web01 | grep -A2 '<cpu'
# <cpu mode='host-passthrough'/>
# 3. Huge pages (reduces TLB misses for large-memory VMs)
echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# Then in VM XML:
# <memoryBacking><hugepages/></memoryBacking>
# 4. I/O threads for disk (separates I/O from vCPU threads)
# In VM XML:
# <iothreads>2</iothreads>
# <disk ...>
# <driver ... iothread='1'/>
# </disk>
# 5. Disable unnecessary emulated devices
# Remove: tablet, USB, sound card if not needed
# In VM XML, remove <input type='tablet'/> etc.
# 6. Enable discard/TRIM for qcow2
# <driver ... discard='unmap'/>
# Guest can then fstrim to reclaim space
Pattern: Live Migration Pre-flight¶
Under the hood: Live migration works by iteratively copying dirty memory pages to the destination while the VM keeps running. If the VM is writing memory faster than the network can copy it (write-heavy databases, large in-memory caches), migration never converges. The
--timeoutand--auto-convergeflags throttle the guest CPU to let migration finish.
# 1. Verify connectivity
virsh -c qemu+ssh://dest-host/system list
# 2. Check CPU compatibility
virsh capabilities > /tmp/src-caps.xml # on source
virsh capabilities > /tmp/dst-caps.xml # on destination
# Compare CPU models — destination must support source features
# 3. Verify shared storage (if using shared storage migration)
# On both hosts, same path must resolve to same storage:
ls -la /var/lib/libvirt/images/web01.qcow2 # on both hosts
# 4. Check that no snapshots with memory state exist
virsh snapshot-list web01
# 5. Perform dry-run
virsh migrate --live --persistent --verbose web01 qemu+ssh://dest-host/system --abort-on-error
# 6. Monitor migration progress
virsh domjobinfo web01
Gotcha: VM Autostart Not Set After Creation¶
You create a VM, it runs fine, the host reboots, and the VM doesn't come back.
# Check autostart
virsh dominfo web01 | grep Autostart
# Autostart: disable
# Enable
virsh autostart web01
# Verify
virsh dominfo web01 | grep Autostart
# Autostart: enable
Pattern: Emergency VM Recovery¶
When virsh commands fail and the VM is stuck:
# 1. Find the QEMU process directly
ps aux | grep qemu | grep web01
# 2. If virsh destroy doesn't work, send SIGTERM to the process
kill $(pgrep -f "guest=web01")
# 3. If still stuck, SIGKILL
kill -9 $(pgrep -f "guest=web01")
# 4. Clean up stale state
virsh managedsave-remove web01 2>/dev/null
rm -f /var/run/libvirt/qemu/web01.pid
# 5. Restart libvirtd if it's confused
systemctl restart libvirtd
# 6. Try starting the VM again
virsh start web01
Gotcha: Nested Virtualization Performance¶
Running VMs inside VMs (common in CI/CD, testing) works but has significant overhead:
# Enable nested virt (Intel)
cat /sys/module/kvm_intel/parameters/nested # Should be Y
# If N:
modprobe -r kvm_intel
modprobe kvm_intel nested=1
echo "options kvm_intel nested=1" > /etc/modprobe.d/kvm-nested.conf
# Nested VMs must use host-passthrough CPU mode
# <cpu mode='host-passthrough'/>
# Expect 10-30% performance overhead for nested workloads
# Never use nested virt for production databases or latency-sensitive services