Bare-Metal Provisioning¶
Reference guide for out-of-band management, automated OS deployment, and server lifecycle operations on Dell PowerEdge hardware.
Mental Model¶
[Running workload ] Kubernetes / VMs / applications
|
[OS + Config Mgmt ] Ubuntu/RHEL + Ansible post-install
|
[OS Install ] Kickstart / Preseed / Cloud-init (automated)
|
[PXE Boot Chain ] DHCP -> TFTP -> bootloader -> installer
|
[Out-of-Band Mgmt ] iDRAC sets boot order, power cycles, mounts ISO
|
[Network Boot NIC ] PXE-capable NIC on the server (usually NIC1)
Provisioning starts from the bottom: iDRAC powers on the server, the NIC PXE boots, DHCP hands out an IP and boot file, TFTP serves the bootloader, and the installer takes over. After OS install, Ansible handles configuration and eventually joins the node to a cluster.
Out-of-Band Management Fundamentals¶
IPMI vs Redfish¶
| Feature | IPMI (legacy) | Redfish (modern) |
|---|---|---|
| Protocol | UDP/623, binary | HTTPS/REST + JSON |
| Authentication | Shared secret, weak crypto | TLS + session tokens |
| Discoverability | Minimal | Self-describing (OData schema) |
| Scriptability | ipmitool (cryptic syntax) | Any HTTP client (curl, Python) |
| Event model | SNMP traps, PET | Redfish EventService + SSE |
| Dell support | All generations | iDRAC8+ (full on iDRAC9) |
| Recommendation | Avoid for new deployments | Preferred for all automation |
IPMI Quick Reference (legacy systems)¶
# Install ipmitool
apt install ipmitool # Debian/Ubuntu
dnf install ipmitool # RHEL/Fedora
# Check power status
ipmitool -I lanplus -H 10.0.10.101 -U root -P password chassis power status
# Power on
ipmitool -I lanplus -H 10.0.10.101 -U root -P password chassis power on
# Power cycle
ipmitool -I lanplus -H 10.0.10.101 -U root -P password chassis power cycle
# Set next boot to PXE (one-time)
ipmitool -I lanplus -H 10.0.10.101 -U root -P password chassis bootdev pxe
# Get sensor readings
ipmitool -I lanplus -H 10.0.10.101 -U root -P password sdr list
# Get system event log
ipmitool -I lanplus -H 10.0.10.101 -U root -P password sel elist
# Serial over LAN (remote console)
ipmitool -I lanplus -H 10.0.10.101 -U root -P password sol activate
Redfish Deep-Dive¶
Redfish is a DMTF standard (not Dell-proprietary). The key resource tree:
/redfish/v1/
/Systems/System.Embedded.1 <- The physical server
/Bios <- BIOS attributes
/Processors <- CPU inventory
/Memory <- DIMM inventory
/Storage <- RAID controllers + drives
/EthernetInterfaces <- NIC ports
/Actions/ComputerSystem.Reset <- Power operations
/Managers/iDRAC.Embedded.1 <- The BMC itself
/LogServices/Sel <- System event log
/LogServices/Lclog <- Lifecycle log
/EthernetInterfaces <- iDRAC NIC config
/Actions/Manager.Reset <- Reboot iDRAC
/UpdateService <- Firmware update operations
/FirmwareInventory <- Installed firmware versions
/Chassis/System.Embedded.1 <- Physical chassis
/Power <- PSU info, power consumption
/Thermal <- Temps, fan speeds
/AccountService <- iDRAC user management
/EventService <- Alert subscriptions
Python Redfish Automation (Essential Pattern)¶
import requests, urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
s = requests.Session()
s.auth = ("root", "password")
s.verify = False
base = "https://10.0.10.101/redfish/v1"
# Get system summary
info = s.get(f"{base}/Systems/System.Embedded.1").json()
print(f"{info['Model']} | SN: {info.get('SKU')} | BIOS: {info['BiosVersion']}")
# Set one-time PXE boot + power cycle
s.patch(f"{base}/Systems/System.Embedded.1", json={
"Boot": {"BootSourceOverrideTarget": "Pxe", "BootSourceOverrideEnabled": "Once"}
})
s.post(f"{base}/Systems/System.Embedded.1/Actions/ComputerSystem.Reset",
json={"ResetType": "ForceRestart"})
For a full Redfish client class, see Dell Server Management.
Ansible + iDRAC Integration¶
---
# Provision a bare-metal server: set PXE boot, power cycle, wait for OS
- name: Bare-metal provision via iDRAC
hosts: idrac_hosts
gather_facts: false
connection: local
vars:
idrac_user: "{{ vault_idrac_user }}"
idrac_password: "{{ vault_idrac_password }}"
tasks:
- name: Set one-time PXE boot
dellemc.openmanage.idrac_boot:
idrac_ip: "{{ inventory_hostname }}"
idrac_user: "{{ idrac_user }}"
idrac_password: "{{ idrac_password }}"
boot_source_override_target: "Pxe"
boot_source_override_enabled: "Once"
- name: Power cycle server to trigger PXE
dellemc.openmanage.idrac_reset:
idrac_ip: "{{ inventory_hostname }}"
idrac_user: "{{ idrac_user }}"
idrac_password: "{{ idrac_password }}"
reset_type: "ForceRestart"
- name: Wait for OS to come up (SSH)
ansible.builtin.wait_for:
host: "{{ hostvars[inventory_hostname].os_ip }}"
port: 22
delay: 120
timeout: 1800
delegate_to: localhost
PXE Boot Chain¶
How PXE Works¶
Server powers on
|
v
NIC sends DHCP DISCOVER (with PXE option 60)
|
v
DHCP server responds with:
- IP address for the server
- Option 66: TFTP server address (next-server)
- Option 67: Boot filename (pxelinux.0 or grubx64.efi)
|
v
Server downloads bootloader via TFTP
|
v
Bootloader loads kernel + initrd
|
v
Kernel boots, starts installer
|
v
Installer fetches kickstart/preseed/autoinstall from HTTP server
|
v
Automated OS installation completes
|
v
Server reboots into installed OS
|
v
Cloud-init / first-boot script calls home to Ansible
DHCP Configuration (ISC DHCP)¶
# /etc/dhcp/dhcpd.conf
subnet 10.0.10.0 netmask 255.255.255.0 {
range 10.0.10.100 10.0.10.200;
option routers 10.0.10.1;
option domain-name-servers 10.0.10.1;
# PXE boot settings
next-server 10.0.10.5; # TFTP server
# UEFI vs Legacy boot detection
class "pxeclients-uefi" {
match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00007";
filename "grubx64.efi";
}
class "pxeclients-legacy" {
match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00000";
filename "pxelinux.0";
}
}
TFTP Directory Structure¶
/srv/tftp/
pxelinux.0 # Legacy BIOS bootloader
grubx64.efi # UEFI bootloader
pxelinux.cfg/
default # Default boot config
01-aa-bb-cc-dd-ee-ff # Per-MAC config (lowercase, dash-separated)
images/
ubuntu-22.04/
vmlinuz # Kernel
initrd # Initial ramdisk
rhel-9/
vmlinuz
initrd.img
PXE Boot Menu (pxelinux.cfg/default)¶
DEFAULT menu.c32
TIMEOUT 100
PROMPT 0
MENU TITLE PXE Boot Menu
LABEL ubuntu-22.04
MENU LABEL Ubuntu 22.04 LTS (Automated)
KERNEL images/ubuntu-22.04/vmlinuz
INITRD images/ubuntu-22.04/initrd
APPEND autoinstall ds=nocloud-net;s=http://10.0.10.5/autoinstall/ubuntu/ ip=dhcp ---
LABEL rhel-9
MENU LABEL RHEL 9 (Kickstart)
KERNEL images/rhel-9/vmlinuz
INITRD images/rhel-9/initrd.img
APPEND inst.ks=http://10.0.10.5/kickstart/rhel9.ks ip=dhcp
LABEL local
MENU LABEL Boot from local disk
LOCALBOOT 0
Kickstart Example (RHEL/Rocky)¶
# /var/www/html/kickstart/rhel9.ks
#version=RHEL9
url --url="http://10.0.10.5/repo/rhel-9/"
text
lang en_US.UTF-8
keyboard us
timezone UTC --utc
rootpw --lock
user --name=deploy --groups=wheel --lock
sshkey --username=deploy "ssh-ed25519 AAAA... deploy@provisioning"
# Network
network --bootproto=dhcp --device=eno1 --activate --onboot=yes --hostname=changeme
# Disk — wipe all, RAID 1 boot + LVM for the rest
zerombr
clearpart --all --initlabel
part /boot/efi --fstype=efi --size=600 --ondisk=sda
part /boot --fstype=xfs --size=1024 --asprimary --ondisk=sda
part pv.01 --size=1 --grow --ondisk=sda
part /boot/efi --fstype=efi --size=600 --ondisk=sdb
part /boot --fstype=xfs --size=1024 --asprimary --ondisk=sdb
part pv.02 --size=1 --grow --ondisk=sdb
raid /boot/efi --device=md0 --fstype=efi --level=1 --raid-devices=2 raid.01 raid.02
volgroup vg_root pv.01 pv.02
logvol / --vgname=vg_root --size=20480 --name=lv_root --fstype=xfs
logvol /var --vgname=vg_root --size=40960 --name=lv_var --fstype=xfs
logvol /tmp --vgname=vg_root --size=10240 --name=lv_tmp --fstype=xfs
logvol swap --vgname=vg_root --size=8192 --name=lv_swap
# Packages
%packages --ignoremissing
@^minimal-environment
openssh-server
python3
curl
chrony
%end
# Post-install
%post --log=/root/ks-post.log
# Enable SSH
systemctl enable sshd
# Set hostname from DHCP (will be overridden by Ansible)
hostnamectl set-hostname $(hostname)
# Signal provisioning server that install is complete
curl -s http://10.0.10.5/api/provision/complete?mac=$(cat /sys/class/net/eno1/address)
# Pull Ansible bootstrap
curl -s http://10.0.10.5/scripts/bootstrap-ansible.sh | bash
%end
# Reboot after install
reboot --eject
Ubuntu Autoinstall Example (Essential Structure)¶
# /var/www/html/autoinstall/ubuntu/user-data
#cloud-config
autoinstall:
version: 1
identity: { hostname: changeme, username: deploy, password: "!" }
ssh: { install-server: true, authorized-keys: ["ssh-ed25519 AAAA..."], allow-pw: false }
network: { network: { version: 2, ethernets: { eno1: { dhcp4: true } } } }
storage: { layout: { name: lvm, sizing-policy: all } }
packages: [python3, curl, chrony]
late-commands:
- curtin in-target -- systemctl enable ssh
- curtin in-target -- bash -c 'curl -s http://10.0.10.5/scripts/bootstrap-ansible.sh | bash'
Provisioning at Scale¶
MAAS vs Foreman vs Ironic¶
| Feature | MAAS (Canonical) | Foreman + Katello | OpenStack Ironic |
|---|---|---|---|
| Primary use case | Bare-metal cloud | Lifecycle management | OpenStack bare-metal service |
| Discovery | PXE enlistment | PXE + smart proxy | Inspector (PXE-based) |
| OS support | Ubuntu (best), CentOS, RHEL | RHEL, CentOS, Ubuntu, SLES | Any (via deploy images) |
| Provisioning method | Curtin + cloud-init | Kickstart/Preseed + Puppet | IPA (Ironic Python Agent) |
| Config management | Cloud-init, Ansible (post) | Puppet, Ansible, Salt | None (external) |
| Network management | VLAN, fabric, subnet, DHCP | Smart proxy DHCP/DNS/TFTP | Neutron integration |
| Dell integration | IPMI power control | BMC plugin + IPMI | IPMI, Redfish drivers |
| RAID configuration | Curtin storage config | Partition tables | RAID via IPA cleaning |
| Firmware management | Not built-in | katello content views | Not built-in |
| API | REST + CLI | REST + Hammer CLI | REST (OpenStack API) |
| Complexity | Low-Medium | Medium-High | High (needs OpenStack) |
| Best for | Ubuntu-first, cloud model | RHEL-first, enterprise | Already running OpenStack |
End-to-End Provisioning Workflow¶
New server arrives
|
v
1. Rack & cable (power A+B, iDRAC dedicated NIC, data NICs)
|
v
2. iDRAC auto-discovers via DHCP or manual IP assignment
|
v
3. Automation configures iDRAC:
- Set hostname, DNS, NTP
- Create admin account, disable default root
- Enable alerts (SNMP/email/syslog)
- Apply golden BIOS config (see dell-server-management.md)
- Configure RAID (OS mirror + data array)
|
v
4. Set one-time PXE boot + power cycle via Redfish
|
v
5. PXE boot -> automated OS install (kickstart/autoinstall)
|
v
6. Post-install callback triggers Ansible:
- Base OS hardening (CIS benchmark)
- Install monitoring agent (node_exporter, promtail)
- Configure networking (bonds, VLANs, routes)
- Install container runtime (containerd)
- Join Kubernetes cluster (k3s/kubeadm)
|
v
7. Node appears in cluster, ready for workloads
|
v
8. Update CMDB/inventory database with:
- Service tag, model, serial
- Rack location (room/row/rack/U)
- IP addresses (iDRAC, OS management, data)
- Warranty expiration
Bootstrap Ansible Playbook (post-OS-install)¶
---
- name: Bootstrap newly provisioned server
hosts: new_servers
become: true
vars:
k3s_version: "v1.31.0+k3s1"
k3s_server_url: "https://10.0.10.10:6443"
k3s_token: "{{ vault_k3s_token }}"
tasks:
- name: Set hostname
ansible.builtin.hostname:
name: "{{ inventory_hostname }}"
- name: Configure NTP
ansible.builtin.copy:
content: |
server ntp.example.com iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
dest: /etc/chrony.conf
notify: restart chrony
- name: Install base packages
ansible.builtin.package:
name:
- curl
- jq
- htop
- iotop
- sysstat
- net-tools
- lvm2
- chrony
state: present
- name: Install node_exporter
ansible.builtin.include_role:
name: prometheus.prometheus.node_exporter
- name: Install promtail for log shipping
ansible.builtin.include_role:
name: grafana.grafana.promtail
- name: Join k3s cluster as agent
ansible.builtin.shell: |
curl -sfL https://get.k3s.io | \
K3S_URL="{{ k3s_server_url }}" \
K3S_TOKEN="{{ k3s_token }}" \
INSTALL_K3S_VERSION="{{ k3s_version }}" \
sh -
args:
creates: /usr/local/bin/k3s
handlers:
- name: restart chrony
ansible.builtin.service:
name: chronyd
state: restarted
Server Decommissioning¶
Decommissioning Checklist¶
Server to be decommissioned: ________________
Service Tag: ________________
Rack Location: ________________
Pre-decommission:
[ ] All workloads migrated/drained (kubectl drain)
[ ] Node removed from cluster (kubectl delete node)
[ ] Monitoring alerts silenced/removed
[ ] DNS records removed
[ ] DHCP reservation released
[ ] Backup any local data if needed
[ ] Document reason for decommission
Disk wipe (NIST 800-88):
[ ] Clear: Single-pass overwrite (sufficient for non-classified)
Command: shred -n 1 -z /dev/sdX
[ ] Purge: Secure erase via firmware (for SSDs, use ATA Secure Erase)
Command: hdparm --security-set-pass p /dev/sdX && hdparm --security-erase p /dev/sdX
[ ] Destroy: Physical destruction (for classified data)
[ ] Wipe method documented and signed off
iDRAC reset:
[ ] Reset iDRAC to factory defaults
racadm racresetcfg
[ ] Clear Lifecycle Controller logs
racadm lcl_dataclear
Physical:
[ ] Disconnect power cables (both A and B feeds)
[ ] Disconnect network cables
[ ] Label cables before disconnecting (photo + written record)
[ ] Remove from rack
[ ] Remove asset tag / barcode
[ ] Update rack elevation diagram
Inventory/CMDB:
[ ] Update CMDB status to "Decommissioned"
[ ] Record decommission date
[ ] Update warranty tracking
[ ] If recycling: record vendor handoff and certificate of destruction
[ ] If remarketing: record buyer/destination
Sign-off:
[ ] Decommission performed by: ________________
[ ] Date: ________________
[ ] Verified by: ________________
Automated Disk Wipe Script¶
#!/usr/bin/env bash
# disk-wipe.sh — NIST 800-88 Clear method (single-pass + zero)
# WARNING: This permanently destroys all data on the target disk.
set -euo pipefail
DISK="${1:?Usage: disk-wipe.sh /dev/sdX}"
if [[ ! -b "$DISK" ]]; then
echo "Error: $DISK is not a block device" >&2
exit 1
fi
# Safety: refuse to wipe if any partition is mounted
if mount | grep -q "^${DISK}"; then
echo "Error: $DISK has mounted partitions. Unmount first." >&2
exit 1
fi
echo "WARNING: About to wipe $DISK"
echo "Model: $(cat /sys/block/$(basename "$DISK")/device/model 2>/dev/null || echo unknown)"
echo "Size: $(lsblk -dno SIZE "$DISK")"
echo ""
read -rp "Type YES to proceed: " confirm
if [[ "$confirm" != "YES" ]]; then
echo "Aborted."
exit 1
fi
echo "Pass 1: Random overwrite..."
dd if=/dev/urandom of="$DISK" bs=4M status=progress 2>&1 || true
echo "Pass 2: Zero overwrite..."
dd if=/dev/zero of="$DISK" bs=4M status=progress 2>&1 || true
echo "Verifying (spot check first 1MB)..."
NONZERO=$(dd if="$DISK" bs=1M count=1 2>/dev/null | od -An -tx1 | grep -cv '^ 00')
if [[ "$NONZERO" -gt 0 ]]; then
echo "WARNING: Non-zero bytes found in first 1MB. Wipe may be incomplete."
else
echo "Verification passed: first 1MB is zeroed."
fi
echo "Wipe complete: $DISK"
echo "Date: $(date -u +%Y-%m-%dT%H:%M:%SZ)"
echo "Method: NIST 800-88 Clear (random + zero)"