Quiz: Out-of-Band Management¶

11 questions

L1 (7 questions)¶

1. Server is unresponsive to SSH. How do you access it?

Show answer

Use OOB (iDRAC/iLO/IPMI).
1. Web console for virtual KVM.
2. ipmitool -I lanplus power status / power cycle.
3. Serial over LAN if network KVM fails.

2. What is iDRAC and how is it different from IPMI?

Show answer

iDRAC is Dell's BMC implementation. IPMI is the vendor-neutral protocol/standard. iDRAC supports IPMI but adds features: web GUI, virtual media, Lifecycle Controller, Redfish API. iLO is HP's equivalent.

3. How do you use ipmitool to check server health and power cycle remotely?

Show answer

ipmitool -I lanplus -H -U admin -P chassis status (power state). sensor list (temps, fans, voltages). sel list (system event log — hardware errors). chassis power cycle (hard reboot). sol activate (serial-over-LAN console). Always use lanplus (encrypted) over lan. The SEL is the first place to check for hardware faults.

4. A server fails to PXE boot. BIOS shows network as first boot device but it falls through to the hard disk. What do you check on the DHCP/TFTP side?

Show answer

PXE boot sequence: NIC sends DHCP Discover → DHCP server responds with IP + next-server (TFTP IP) + filename (boot file). Failure points:
1. DHCP server not responding to the NIC's MAC — check DHCP logs, verify the MAC is in the allowed pool or reservation.
2. DHCP response missing next-server or filename options — check dhcpd.conf for the PXE-specific options (option 66, option 67).
3. TFTP server unreachable or file missing — verify the boot file path exists on the TFTP server and permissions allow read.
4. BIOS vs UEFI mismatch — PXE BIOS expects pxelinux.0, UEFI expects grubx64.efi. If the wrong file is served, the boot silently fails. Debug: run tcpdump on the DHCP/TFTP server to see if the NIC's requests arrive and what responses are sent.

5. You need console access to a server that is stuck in GRUB during boot. Serial-over-LAN (SOL) is configured but you see no GRUB output. How do you enable GRUB to output over the serial console?

Show answer

GRUB needs explicit serial console configuration. Edit /etc/default/grub:
1. Set GRUB_TERMINAL='serial console' (enables both serial and VGA).
2. Set GRUB_SERIAL_COMMAND='serial --speed=115200 --unit=0 --word=8 --parity=no --stop=1'.
3. Add 'console=tty0 console=ttyS0,115200n8' to GRUB_CMDLINE_LINUX to redirect kernel output to serial.
4. Run grub2-mkconfig -o /boot/grub2/grub.cfg to regenerate the config. The SOL connection must match the baud rate (115200). Common mistake: configuring only the kernel console but not GRUB itself — you see kernel output but can't interact with the GRUB menu. For UEFI systems, the serial port number may differ (ttyS1 on some Dell servers).

6. A compromised application on a server can reach the BMC management interface at 10.0.0.50. What can an attacker do, and how should BMC network access be restricted?

Show answer

With BMC access, an attacker can:
1. Power cycle the server (denial of service).
2. Access the serial console (see boot output, potentially interact with OS login).
3. Mount virtual media (boot from attacker-controlled ISO, install rootkit).
4. Read hardware sensor data and system event logs (information disclosure).
5. Update firmware (persistent compromise surviving OS reinstall). Restrict access:
1. Place BMCs on a dedicated, isolated management VLAN that application servers cannot reach.
2. Firewall rules: only management jump hosts can reach BMC IPs.
3. Change default BMC credentials.
4. Disable unused BMC services (SNMP, SSH, IPMI-over-LAN if not needed).
5. Use IPMI 2.0 with encryption.
6. Monitor BMC access logs. BMC compromise is especially dangerous because it survives OS reinstalls and operates below the OS security boundary.

7. You configure a server's iDRAC IP address using racadm from the host OS. After a hard power cycle, the iDRAC is unreachable. What went wrong?

Show answer

racadm from the host OS modifies the iDRAC configuration via the host-to-BMC interface (typically IPMI over the system bus). Some racadm changes require a BMC reset to take effect, and a hard power cycle (pulling power) may interrupt the BMC configuration write before it is committed to non-volatile storage. The IP configuration is lost because it was in volatile memory during the transition. Prevention:
1. After racadm set, always run 'racadm racreset' to ensure the BMC applies and persists the configuration.
2. Wait for the BMC to come back online and verify the IP with 'racadm getniccfg'.
3. For initial setup, configure the BMC IP via BIOS/UEFI setup screen (F2 during boot) which writes directly to non-volatile storage.
4. Avoid hard power cycles during BMC configuration changes.

L2 (4 questions)¶

1. What is Redfish and why is it replacing IPMI?

Show answer

Redfish is a RESTful API standard for server management (JSON over HTTPS). Replaces IPMI's binary protocol. Easier to automate, better security (TLS, RBAC), richer schema. Supported by Dell, HP, Lenovo, Supermicro.

2. Why are default BMC credentials a critical security risk and how do you mitigate it?

Show answer

BMC/iDRAC/iLO default credentials (admin/admin, root/calvin) give full remote hardware control: power, console, virtual media (boot from attacker ISO), firmware flash. BMCs are often on a management VLAN but still reachable. Mitigations:
1. Change all defaults before racking.
2. Isolate BMC on a dedicated management network.
3. Disable IPMI-over-LAN if not needed.
4. Enable HTTPS-only for web console.
5. Audit BMC firmware versions.

3. Your RAID controller's battery-backed write cache shows 'degraded' status. A firmware update for the RAID controller is scheduled. What is the risk, and what should you do before proceeding?

Show answer

Critical risk: the RAID controller firmware update may require a controller reset or reboot. If the write-back cache has uncommitted data and the battery is degraded, that data could be lost during the reset — the battery can't preserve cache contents through a power interruption. Before proceeding:
1. Switch the write policy from write-back to write-through (megacli, storcli, or vendor tool) — this bypasses the cache entirely, writes go directly to disk. Performance drops but data is safe.
2. Wait for all pending cache flushes to complete.
3. Verify the cache is empty.
4. Proceed with the firmware update.
5. After update, replace the battery/capacitor.
6. Switch back to write-back only after the new battery is fully charged. Never update RAID firmware with dirty cache and degraded battery.

4. Your Kickstart installation fails halfway through network instability. When you retry, it starts from scratch. How do you make unattended provisioning more resilient?

Show answer

1. Use a local mirror: set up a local HTTP/NFS repository on the same subnet so package installation does not depend on WAN connectivity.
2. Two-phase provisioning: Kickstart installs only the minimal OS (base packages from local media/PXE), then a post-install script (Ansible, cloud-init, or a custom agent) handles application configuration with retry logic.
3. Cache packages: configure a caching proxy (Squid, Nexus) between the provisioning server and external repos.
4. Idempotent post-install: ensure the post-install script can be re-run safely — check state before applying changes.
5. Pre-validate network: the Kickstart %pre section can test connectivity before proceeding.
6. Store provisioning state in BMC or IPMI OEM fields so a retry knows where the previous attempt stopped.