Portal | Level: L2: Operations | Topics: PXE / Provisioning, Out-of-Band Management, Server Hardware | Domain: Datacenter & Hardware
Bare-Metal Provisioning - Primer¶
Comprehensive Reference
For the full operational guide with detailed PXE configs, Kickstart/Preseed templates, and decommissioning checklists, see Bare-Metal Provisioning Guide.
Why This Matters¶
Cloud abstracts away hardware. Bare metal doesn't. When you're responsible for 1,500 servers across 22 racks, provisioning isn't clicking "Launch Instance" — it's PXE boot chains, BIOS settings, IPMI commands, OS imaging, post-install validation, and firmware management. These skills are rare, highly valued, and the difference between a datacenter that hums and one that burns.
The Provisioning Lifecycle¶
Rack & Cable → OOB Setup → BIOS Config → PXE Boot → OS Install → Post-Install → Validation → Production
↑ |
└──────────────────────── Decommission / Reprovision ←──────────────────────────────────────┘
Every step can fail. Automation makes each step repeatable. The goal is zero-touch provisioning: rack a server, plug in power and network, and it bootstraps itself to production-ready state.
PXE Boot¶
How PXE Works¶
Server powers on
→ NIC firmware sends DHCP DISCOVER
→ DHCP server responds with:
- IP address
- next-server (TFTP server IP)
- filename (bootloader path, e.g., pxelinux.0 or ipxe.efi)
→ Server downloads bootloader via TFTP
→ Bootloader loads kernel + initrd
→ Kernel starts installer (Kickstart, Preseed, Autoinstall)
→ Installer partitions, installs packages, runs post-scripts
→ Server reboots into production OS
DHCP Configuration for PXE¶
# ISC DHCP example
subnet 10.0.1.0 netmask 255.255.255.0 {
range 10.0.1.100 10.0.1.200;
option routers 10.0.1.1;
next-server 10.0.1.10; # TFTP server
# UEFI vs BIOS boot
if option arch = 00:07 {
filename "ipxe/snponly.efi"; # UEFI
} else {
filename "pxelinux.0"; # Legacy BIOS
}
}
host web-01 {
hardware ethernet aa:bb:cc:dd:ee:ff;
fixed-address 10.0.1.50;
option host-name "web-01";
}
iPXE Chainloading¶
Modern setups chainload from PXE to iPXE for HTTP-based boot (faster than TFTP):
#!ipxe
dhcp
set base-url http://provision.internal/images
kernel ${base-url}/vmlinuz \
initrd=initrd.img \
inst.ks=http://provision.internal/kickstart/${mac:hexhyp}.ks \
ip=dhcp \
console=tty0 console=ttyS1,115200
initrd ${base-url}/initrd.img
boot
Kickstart (RHEL/CentOS)¶
Kickstart automates the entire OS installation:
# Minimal Kickstart for a datacenter server
install
url --url="http://provision.internal/repos/centos-8/"
lang en_US.UTF-8
keyboard us
timezone UTC --utc
rootpw --iscrypted $6$rounds=4096$salt$hash...
# Network
network --bootproto=dhcp --device=eth0 --onboot=yes --hostname=TEMPLATE
# Disk — wipe everything, use LVM
zerombr
clearpart --all --initlabel
autopart --type=lvm --fstype=xfs
# Packages
%packages --nobase
@core
openssh-server
chrony
lvm2
%end
# Post-install
%post --log=/root/ks-post.log
#!/bin/bash
set -euo pipefail
# Register with config management
curl -s http://provision.internal/bootstrap.sh | bash
# Set up SSH keys
mkdir -p /root/.ssh
curl -s http://provision.internal/keys/ops-team.pub >> /root/.ssh/authorized_keys
chmod 700 /root/.ssh
chmod 600 /root/.ssh/authorized_keys
# Phone home: signal provisioning server
curl -X POST http://provision.internal/api/complete \
-d "mac=$(cat /sys/class/net/eth0/address)&status=success"
%end
Per-Host Kickstart Generation¶
Generate host-specific kickstart files from templates:
# Template uses %%HOSTNAME%%, %%IP%%, %%GATEWAY%%
for host in $(cat inventory.csv); do
IFS=',' read -r name mac ip gw <<< "${host}"
sed -e "s/%%HOSTNAME%%/${name}/" \
-e "s/%%IP%%/${ip}/" \
-e "s/%%GATEWAY%%/${gw}/" \
kickstart.template > "/var/www/kickstart/${mac}.ks"
done
ONIE (Open Network Install Environment)¶
ONIE is the PXE equivalent for network switches (Cumulus Linux, SONiC):
Switch powers on → ONIE boots → ONIE discovers installer via:
1. DHCP option (default-url)
2. HTTP discovery (onie-installer on DHCP-provided server)
3. USB drive
4. TFTP fallback
→ Downloads and runs NOS installer
→ Switch reboots into Cumulus/SONiC
DHCP for ONIE¶
class "onie-switch" {
match if substring(option vendor-class-identifier, 0, 4) = "onie";
option default-url "http://provision.internal/onie/onie-installer-cumulus.bin";
}
IPMI / BMC Operations¶
Out-of-band management for remote hardware control:
# Power operations
ipmitool -I lanplus -H 10.0.99.50 -U admin -P secret chassis power status
ipmitool -I lanplus -H 10.0.99.50 -U admin -P secret chassis power cycle
ipmitool -I lanplus -H 10.0.99.50 -U admin -P secret chassis power on
# Set boot device for next boot (PXE)
ipmitool -I lanplus -H 10.0.99.50 -U admin -P secret chassis bootdev pxe
# Serial-over-LAN (remote console)
ipmitool -I lanplus -H 10.0.99.50 -U admin -P secret sol activate
# Sensor readings (temperature, fans, voltage)
ipmitool -I lanplus -H 10.0.99.50 -U admin -P secret sdr list
# System Event Log
ipmitool -I lanplus -H 10.0.99.50 -U admin -P secret sel list
ipmitool -I lanplus -H 10.0.99.50 -U admin -P secret sel clear
Redfish API (Modern BMC)¶
Redfish is the REST-based successor to IPMI:
# Get system info
curl -sk https://10.0.99.50/redfish/v1/Systems/1 \
-u admin:secret | jq '.PowerState, .Model, .SerialNumber'
# Power cycle
curl -sk -X POST https://10.0.99.50/redfish/v1/Systems/1/Actions/ComputerSystem.Reset \
-u admin:secret \
-H 'Content-Type: application/json' \
-d '{"ResetType": "ForceRestart"}'
# Set one-time PXE boot
curl -sk -X PATCH https://10.0.99.50/redfish/v1/Systems/1 \
-u admin:secret \
-H 'Content-Type: application/json' \
-d '{"Boot": {"BootSourceOverrideTarget": "Pxe", "BootSourceOverrideEnabled": "Once"}}'
BIOS/UEFI Configuration at Scale¶
Dell servers use racadm for remote BIOS configuration:
# Export current BIOS config
racadm -r 10.0.99.50 -u admin -p secret get BIOS.SysProfileSettings
# Set performance profile
racadm -r 10.0.99.50 -u admin -p secret set BIOS.SysProfileSettings.SysProfile PerfOptimized
# Disable hyperthreading
racadm -r 10.0.99.50 -u admin -p secret set BIOS.ProcSettings.LogicalProc Disabled
# Apply and schedule reboot
racadm -r 10.0.99.50 -u admin -p secret jobqueue create BIOS.Setup.1-1 -r pwrcycle
Post-Install Validation¶
Verify the server is correctly provisioned before handing it to production:
#!/usr/bin/env bash
set -euo pipefail
ERRORS=0
check() {
if ! "$@" &>/dev/null; then
echo "FAIL: $*" >&2
(( ERRORS++ ))
else
echo "PASS: $*"
fi
}
# Hardware checks
check test "$(nproc)" -ge 16
check test "$(free -g | awk '/Mem:/{print $2}')" -ge 62
check lsblk -d -o NAME | grep -q sda
# OS checks
check systemctl is-active sshd
check systemctl is-active chronyd
check timedatectl | grep -q 'synchronized: yes'
# Network checks
check ping -c1 -W2 gateway.internal
check curl -sf http://provision.internal/health
check dig +short provision.internal
# Security checks
check test "$(getenforce)" = "Enforcing"
check test ! -f /root/.ssh/id_rsa # No private keys left behind
echo "Validation complete: ${ERRORS} failures"
exit "${ERRORS}"
Provisioning Architecture¶
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
│ DHCP/TFTP │ │ HTTP Server │ │ Config Mgmt │
│ (dnsmasq) │ │ (nginx) │ │ (Ansible) │
└──────┬──────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────────────────┐
│ Provisioning Network │
│ (isolated VLAN, OOB access) │
└──────────────────────────────────────────────────────┘
│ │ │
┌───▼───┐ ┌───▼───┐ ┌───▼───┐
│ Srv 1 │ │ Srv 2 │ │ Srv N │
│ (BMC) │ │ (BMC) │ │ (BMC) │
└───────┘ └───────┘ └───────┘
Key components: - Provisioning VLAN: Isolated network for PXE traffic - DHCP + TFTP: Serves IP addresses and bootloaders - HTTP server: Hosts OS images, kickstart files, scripts - Config management: Post-install configuration (Ansible, Puppet) - CMDB/Inventory: Tracks what's provisioned and where
Wiki Navigation¶
Prerequisites¶
- Datacenter & Server Hardware (Topic Pack, L1)
Related Content¶
- Datacenter & Server Hardware (Topic Pack, L1) — Out-of-Band Management, PXE / Provisioning, Server Hardware
- Case Study: Serial Console Garbled (Case Study, L1) — Out-of-Band Management, Server Hardware
- Case Study: Server Remote Console Lag (Case Study, L1) — Out-of-Band Management, Server Hardware
- Deep Dive: Dell Linux PowerEdge (deep_dive, L2) — Out-of-Band Management, Server Hardware
- Dell PowerEdge Servers (Topic Pack, L1) — Out-of-Band Management, Server Hardware
- IPMI and ipmitool (Topic Pack, L1) — Out-of-Band Management, Server Hardware
- Redfish API (Topic Pack, L1) — Out-of-Band Management, Server Hardware
- Skillcheck: Datacenter (Assessment, L1) — Out-of-Band Management, Server Hardware
- Case Study: BIOS Settings Reset After CMOS (Case Study, L1) — Server Hardware
- Case Study: BMC Clock Skew Cert Failure (Case Study, L2) — Out-of-Band Management
Pages that link here¶
- Anti-Primer: Bare Metal Provisioning
- BIOS Settings Reverted After CMOS Battery Replacement
- BMC Clock Skew - Certificate Failure
- Bare-Metal Provisioning
- Bare-Metal Provisioning
- Datacenter & Server Hardware
- Datacenter Skillcheck
- Dell PowerEdge Servers
- Dell PowerEdge on Linux - Deep Dive Guide
- IPMI and ipmitool
- Incident Replay: OS Install Fails — RAID Controller Not Detected
- Incident Replay: PXE Boot Fails — UEFI Mismatch
- Master Curriculum: 40 Weeks
- Redfish API
- Remote KVM/Console Extremely Laggy