Skip to content

Vmware

← Back to all decks

31 cards — 🟢 8 easy | 🟡 12 medium | 🔴 5 hard

🟢 Easy (8)

1. What is ESXi and what type of hypervisor is it?

Show answer ESXi is VMware's bare-metal (Type 1) hypervisor. It installs directly on server hardware with a minimal footprint (~150MB). It runs a custom VMkernel, not a full Linux OS. The Direct Console User Interface (DCUI) provides emergency host configuration.

Remember: "ESXi = bare metal, no host OS underneath."

2. What is vCenter and what does it provide?

Show answer vCenter is the centralized management plane for ESXi hosts. It provides: inventory management (hosts, clusters, datacenters), vMotion (live migration), DRS (automatic load balancing), HA (auto-restart VMs on failure), RBAC, templates, and clones. It runs as the vCenter Server Appliance (VCSA) on Photon OS.

3. What is vMotion?

Show answer vMotion live-migrates a running VM from one ESXi host to another with near-zero downtime. It iteratively copies memory pages to the destination, briefly stuns the VM (<1 second), then resumes it on the new host. Requirements: shared storage (or Storage vMotion), compatible CPUs (EVC mode), and a dedicated vMotion network (10G+ recommended).

4. Why is VMware Tools (open-vm-tools) essential inside guest VMs?

Show answer Without VMware Tools: no graceful shutdown (only hard power-off), no memory ballooning, no quiesced snapshots, no heartbeat monitoring for HA, degraded network/disk performance, and inaccurate metrics. On Linux, install the open-vm-tools package. On Windows, install from the mounted ISO.

5. What does vSphere HA do?

Show answer vSphere HA monitors host heartbeats (via network and datastore). If a host fails, HA automatically restarts the failed host's VMs on surviving cluster members. Admission control reserves capacity to ensure there are enough resources to restart VMs after a failure.

6. What does DRS (Distributed Resource Scheduler) do?

Show answer DRS automatically balances VM workloads across cluster hosts by recommending or performing vMotion migrations. Automation levels: Manual (suggestions only), Partially Automated (initial placement only), Fully Automated (ongoing rebalancing). Affinity/anti-affinity rules control which VMs should or should not be co-located.

7. What is VMFS?

Show answer VMFS (Virtual Machine File System) is VMware's clustered filesystem for block storage (FC and iSCSI). VMFS 6 supports volumes up to 64TB. It uses on-disk locking (ATS) to allow multiple ESXi hosts to access the same datastore safely. VMFS is the traditional and most common storage option for ESXi.

8. What are the two main CLI tools on an ESXi host?

Show answer esxcli: manages host configuration — hardware, network, storage, software, and system settings. vim-cmd: manages VM lifecycle — list VMs (vmsvc/getallvms), power on/off, create/delete snapshots. Both are used over SSH or via the DCUI shell.

🟡 Medium (12)

1. What is EVC mode and why is it critical for clusters?

Show answer Enhanced vMotion Compatibility (EVC) masks CPU features to the lowest common denominator in the cluster. Without EVC, vMotion between hosts with different CPU generations fails — often discovered during a host failure when HA tries to restart VMs on a different-generation host. Set EVC when creating the cluster; changing it later requires powering off all VMs.

2. How does vSAN work and what are its minimum requirements?

Show answer vSAN aggregates local disks across ESXi hosts into a distributed datastore. Each host contributes disk groups (1 cache SSD + 1-7 capacity devices). Storage policies define redundancy: FTT=1 (tolerate 1 failure, minimum 3 hosts, RAID-1) or FTT=1 with RAID-5 (minimum 4 hosts, erasure coding). vSAN eliminates the need for external SAN hardware.

3. What is the difference between a Standard vSwitch and a Distributed vSwitch?

Show answer Standard vSwitch (vSS): per-host configuration, simple but doesn't scale — each host must be configured manually. Distributed vSwitch (vDS): centrally managed from vCenter, consistent policy across all hosts, supports NetFlow, port mirroring, LACP, and Network I/O Control. vDS is required for NSX.

4. What are VMkernel ports and how should they be separated?

Show answer VMkernel ports carry management, vMotion, vSAN, and NFS traffic. Each type should be on its own VLAN with a dedicated VMkernel adapter. Sharing vMotion with management traffic causes connectivity issues during migrations. vSAN requires its own VMkernel port. Jumbo frames (MTU 9000) are needed for vSAN and NFS but must be consistent end-to-end.

5. Why are snapshots not a substitute for backups?

Show answer Snapshots create delta VMDKs that capture every write after the snapshot point. The chain degrades I/O performance as it grows, resides on the same storage as the VM (no offsite copy), and can fill the datastore if left running. They are meant for short-term rollback (hours), not long-term protection. Use a real backup tool that creates temporary snapshots during its backup window.

6. What are the key performance metrics in esxtop?

Show answer %RDY (CPU Ready): >5% means VM is waiting for physical CPU — host is overcommitted. %CSTP (Co-Stop): >3% means scheduling delay from too many vCPUs — reduce vCPU count. %SWPWT (Swap Wait): >0 means host is swapping VM memory to disk — add RAM. MCTLSZ (Balloon): memory being reclaimed via balloon driver. Use esxtop -b for batch mode collection.

7. Why does over-provisioning vCPUs hurt VM performance?

Show answer The hypervisor must co-schedule all vCPUs simultaneously. A VM with 16 vCPUs must wait for 16 physical cores to be free at the same time, increasing scheduling latency (%CSTP). A VM with 4 properly-sized vCPUs often outperforms one with
16. Start small, monitor actual usage, and scale up only when needed.

8. What is the correct maintenance mode workflow for an ESXi host?

Show answer 1. Enter maintenance mode in vCenter (DRS migrates VMs automatically).
2. Verify all VMs migrated (esxcli vm process list should be empty).
3. Apply patches or firmware.
4. Reboot if required.
5. Exit maintenance mode. For vSAN clusters: only one host at a time, wait for data resync before proceeding to the next.

9. How do templates and Content Libraries work for VM provisioning?

Show answer A template is a VM converted to a read-only image for repeatable deployments. Content Libraries distribute templates, ISOs, and OVAs across vCenters using a publish/subscribe model. Guest Customization Specs automate hostname, IP, and domain join on first boot. Combine with Terraform or PowerCLI for fully automated provisioning.

10. What is the risk of thin provisioning without monitoring?

Show answer Thin-provisioned disks allocate storage on demand — a 500GB disk may only use 50GB initially. But the sum of all thin disks can exceed physical datastore capacity. When the datastore fills, all VMs on it freeze simultaneously. Set alarms at 75% and 85% usage. Track the over-commit ratio (total provisioned / actual capacity).

11. How do you diagnose and fix a "file locked" error when powering on a VM?

Show answer Check which host holds the lock: vmkfstools -D . Look for stale .lck directories and .vswp files. Verify the VM is not running on another host. If the host that held the lock crashed, the lock auto-expires in ~15 seconds. Remove stale .lck directories only after confirming no host is running the VM.

12. What tools are available for VMware infrastructure as code?

Show answer Terraform (vsphere provider): provision VMs, networks, storage. Ansible (community.vmware collection): configuration management. Packer (vsphere-iso builder): build VM templates from ISO. PowerCLI: PowerShell-based automation. govc: lightweight Go-based CLI for scripting. All connect via the vCenter API.

🔴 Hard (5)

1. What is a PSOD and how do you respond to one?

Show answer A PSOD (Purple Screen of Death) is an ESXi host crash — all VMs on that host go down. Common causes: faulty hardware (RAM, NIC, HBA), driver bugs, NFS disconnections. After reboot: collect the core dump (esxcli system coredump file list), check vmkernel.log, generate a support bundle (vm-support). HA restarts VMs on surviving hosts. Update ESXi patches, firmware, and drivers to HCL versions.

2. What happens when HA admission control is disabled?

Show answer Without admission control, HA does not reserve capacity for failover. If a host fails, surviving hosts may not have enough CPU/memory to restart all VMs. You get a false sense of protection — HA is "enabled" but cannot actually recover all workloads. Always enable admission control with a policy like "tolerate 1 host failure." This reserves ~25-50% of cluster capacity — that reserved capacity is the point.

3. Why is vSAN quorum critical and what happens when it's lost?

Show answer vSAN with FTT=1 (RAID-1) tolerates 1 host failure. With 3 hosts, putting 2 in maintenance mode simultaneously loses quorum — all data becomes inaccessible. Only put one host in maintenance at a time and wait for data resync to complete before proceeding. Use 4+ host clusters if you need rolling maintenance windows without risk.

4. Why are VM resource limits dangerous in production?

Show answer A CPU or memory limit caps a VM's resource consumption even when the host has spare capacity. Limits are often set "temporarily" for testing and forgotten, causing mysterious performance degradation months later. Use reservations (guaranteed minimums) and shares (relative priority) instead. Audit for limits: Get-VM | Get-VMResourceConfiguration | Where-Object {$_.CpuLimitMhz -ne -1}.

5. How did the Broadcom acquisition change VMware licensing?

Show answer Broadcom eliminated perpetual licenses (subscription-only), discontinued free ESXi, and consolidated SKUs into VMware Cloud Foundation (VCF) and vSphere Foundation (VSF). This significantly increased costs for many customers and drove evaluation of alternatives: Proxmox, KVM/oVirt, Nutanix, and cloud migration. Understanding the licensing model is critical for capacity planning and budgeting.