Skip to content

Bare-Metal Provisioning

Reference guide for out-of-band management, automated OS deployment, and server lifecycle operations on Dell PowerEdge hardware.

Mental Model

[Running workload  ]  Kubernetes / VMs / applications
|
[OS + Config Mgmt  ]  Ubuntu/RHEL + Ansible post-install
|
[OS Install        ]  Kickstart / Preseed / Cloud-init (automated)
|
[PXE Boot Chain    ]  DHCP -> TFTP -> bootloader -> installer
|
[Out-of-Band Mgmt  ]  iDRAC sets boot order, power cycles, mounts ISO
|
[Network Boot NIC  ]  PXE-capable NIC on the server (usually NIC1)

Provisioning starts from the bottom: iDRAC powers on the server, the NIC PXE boots, DHCP hands out an IP and boot file, TFTP serves the bootloader, and the installer takes over. After OS install, Ansible handles configuration and eventually joins the node to a cluster.


Out-of-Band Management Fundamentals

IPMI vs Redfish

Feature IPMI (legacy) Redfish (modern)
Protocol UDP/623, binary HTTPS/REST + JSON
Authentication Shared secret, weak crypto TLS + session tokens
Discoverability Minimal Self-describing (OData schema)
Scriptability ipmitool (cryptic syntax) Any HTTP client (curl, Python)
Event model SNMP traps, PET Redfish EventService + SSE
Dell support All generations iDRAC8+ (full on iDRAC9)
Recommendation Avoid for new deployments Preferred for all automation

IPMI Quick Reference (legacy systems)

# Install ipmitool
apt install ipmitool   # Debian/Ubuntu
dnf install ipmitool   # RHEL/Fedora

# Check power status
ipmitool -I lanplus -H 10.0.10.101 -U root -P password chassis power status

# Power on
ipmitool -I lanplus -H 10.0.10.101 -U root -P password chassis power on

# Power cycle
ipmitool -I lanplus -H 10.0.10.101 -U root -P password chassis power cycle

# Set next boot to PXE (one-time)
ipmitool -I lanplus -H 10.0.10.101 -U root -P password chassis bootdev pxe

# Get sensor readings
ipmitool -I lanplus -H 10.0.10.101 -U root -P password sdr list

# Get system event log
ipmitool -I lanplus -H 10.0.10.101 -U root -P password sel elist

# Serial over LAN (remote console)
ipmitool -I lanplus -H 10.0.10.101 -U root -P password sol activate

Redfish Deep-Dive

Redfish is a DMTF standard (not Dell-proprietary). The key resource tree:

/redfish/v1/
  /Systems/System.Embedded.1          <- The physical server
    /Bios                             <- BIOS attributes
    /Processors                       <- CPU inventory
    /Memory                           <- DIMM inventory
    /Storage                          <- RAID controllers + drives
    /EthernetInterfaces               <- NIC ports
    /Actions/ComputerSystem.Reset     <- Power operations
  /Managers/iDRAC.Embedded.1          <- The BMC itself
    /LogServices/Sel                  <- System event log
    /LogServices/Lclog                <- Lifecycle log
    /EthernetInterfaces               <- iDRAC NIC config
    /Actions/Manager.Reset            <- Reboot iDRAC
  /UpdateService                      <- Firmware update operations
    /FirmwareInventory                <- Installed firmware versions
  /Chassis/System.Embedded.1          <- Physical chassis
    /Power                            <- PSU info, power consumption
    /Thermal                          <- Temps, fan speeds
  /AccountService                     <- iDRAC user management
  /EventService                       <- Alert subscriptions

Python Redfish Automation (Essential Pattern)

import requests, urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

s = requests.Session()
s.auth = ("root", "password")
s.verify = False
base = "https://10.0.10.101/redfish/v1"

# Get system summary
info = s.get(f"{base}/Systems/System.Embedded.1").json()
print(f"{info['Model']} | SN: {info.get('SKU')} | BIOS: {info['BiosVersion']}")

# Set one-time PXE boot + power cycle
s.patch(f"{base}/Systems/System.Embedded.1", json={
    "Boot": {"BootSourceOverrideTarget": "Pxe", "BootSourceOverrideEnabled": "Once"}
})
s.post(f"{base}/Systems/System.Embedded.1/Actions/ComputerSystem.Reset",
       json={"ResetType": "ForceRestart"})

For a full Redfish client class, see Dell Server Management.

Ansible + iDRAC Integration

---
# Provision a bare-metal server: set PXE boot, power cycle, wait for OS
- name: Bare-metal provision via iDRAC
  hosts: idrac_hosts
  gather_facts: false
  connection: local

  vars:
    idrac_user: "{{ vault_idrac_user }}"
    idrac_password: "{{ vault_idrac_password }}"

  tasks:
    - name: Set one-time PXE boot
      dellemc.openmanage.idrac_boot:
        idrac_ip: "{{ inventory_hostname }}"
        idrac_user: "{{ idrac_user }}"
        idrac_password: "{{ idrac_password }}"
        boot_source_override_target: "Pxe"
        boot_source_override_enabled: "Once"

    - name: Power cycle server to trigger PXE
      dellemc.openmanage.idrac_reset:
        idrac_ip: "{{ inventory_hostname }}"
        idrac_user: "{{ idrac_user }}"
        idrac_password: "{{ idrac_password }}"
        reset_type: "ForceRestart"

    - name: Wait for OS to come up (SSH)
      ansible.builtin.wait_for:
        host: "{{ hostvars[inventory_hostname].os_ip }}"
        port: 22
        delay: 120
        timeout: 1800
      delegate_to: localhost

PXE Boot Chain

How PXE Works

Server powers on
  |
  v
NIC sends DHCP DISCOVER (with PXE option 60)
  |
  v
DHCP server responds with:
  - IP address for the server
  - Option 66: TFTP server address (next-server)
  - Option 67: Boot filename (pxelinux.0 or grubx64.efi)
  |
  v
Server downloads bootloader via TFTP
  |
  v
Bootloader loads kernel + initrd
  |
  v
Kernel boots, starts installer
  |
  v
Installer fetches kickstart/preseed/autoinstall from HTTP server
  |
  v
Automated OS installation completes
  |
  v
Server reboots into installed OS
  |
  v
Cloud-init / first-boot script calls home to Ansible

DHCP Configuration (ISC DHCP)

# /etc/dhcp/dhcpd.conf
subnet 10.0.10.0 netmask 255.255.255.0 {
  range 10.0.10.100 10.0.10.200;
  option routers 10.0.10.1;
  option domain-name-servers 10.0.10.1;

  # PXE boot settings
  next-server 10.0.10.5;           # TFTP server

  # UEFI vs Legacy boot detection
  class "pxeclients-uefi" {
    match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00007";
    filename "grubx64.efi";
  }
  class "pxeclients-legacy" {
    match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00000";
    filename "pxelinux.0";
  }
}

TFTP Directory Structure

/srv/tftp/
  pxelinux.0                    # Legacy BIOS bootloader
  grubx64.efi                   # UEFI bootloader
  pxelinux.cfg/
    default                     # Default boot config
    01-aa-bb-cc-dd-ee-ff        # Per-MAC config (lowercase, dash-separated)
  images/
    ubuntu-22.04/
      vmlinuz                   # Kernel
      initrd                    # Initial ramdisk
    rhel-9/
      vmlinuz
      initrd.img

PXE Boot Menu (pxelinux.cfg/default)

DEFAULT menu.c32
TIMEOUT 100
PROMPT 0

MENU TITLE PXE Boot Menu

LABEL ubuntu-22.04
  MENU LABEL Ubuntu 22.04 LTS (Automated)
  KERNEL images/ubuntu-22.04/vmlinuz
  INITRD images/ubuntu-22.04/initrd
  APPEND autoinstall ds=nocloud-net;s=http://10.0.10.5/autoinstall/ubuntu/ ip=dhcp ---

LABEL rhel-9
  MENU LABEL RHEL 9 (Kickstart)
  KERNEL images/rhel-9/vmlinuz
  INITRD images/rhel-9/initrd.img
  APPEND inst.ks=http://10.0.10.5/kickstart/rhel9.ks ip=dhcp

LABEL local
  MENU LABEL Boot from local disk
  LOCALBOOT 0

Kickstart Example (RHEL/Rocky)

# /var/www/html/kickstart/rhel9.ks
#version=RHEL9
url --url="http://10.0.10.5/repo/rhel-9/"
text
lang en_US.UTF-8
keyboard us
timezone UTC --utc
rootpw --lock
user --name=deploy --groups=wheel --lock
sshkey --username=deploy "ssh-ed25519 AAAA... deploy@provisioning"

# Network
network --bootproto=dhcp --device=eno1 --activate --onboot=yes --hostname=changeme

# Disk — wipe all, RAID 1 boot + LVM for the rest
zerombr
clearpart --all --initlabel
part /boot/efi --fstype=efi --size=600 --ondisk=sda
part /boot --fstype=xfs --size=1024 --asprimary --ondisk=sda
part pv.01 --size=1 --grow --ondisk=sda
part /boot/efi --fstype=efi --size=600 --ondisk=sdb
part /boot --fstype=xfs --size=1024 --asprimary --ondisk=sdb
part pv.02 --size=1 --grow --ondisk=sdb
raid /boot/efi --device=md0 --fstype=efi --level=1 --raid-devices=2 raid.01 raid.02
volgroup vg_root pv.01 pv.02
logvol /     --vgname=vg_root --size=20480 --name=lv_root --fstype=xfs
logvol /var  --vgname=vg_root --size=40960 --name=lv_var  --fstype=xfs
logvol /tmp  --vgname=vg_root --size=10240 --name=lv_tmp  --fstype=xfs
logvol swap  --vgname=vg_root --size=8192  --name=lv_swap

# Packages
%packages --ignoremissing
@^minimal-environment
openssh-server
python3
curl
chrony
%end

# Post-install
%post --log=/root/ks-post.log
# Enable SSH
systemctl enable sshd

# Set hostname from DHCP (will be overridden by Ansible)
hostnamectl set-hostname $(hostname)

# Signal provisioning server that install is complete
curl -s http://10.0.10.5/api/provision/complete?mac=$(cat /sys/class/net/eno1/address)

# Pull Ansible bootstrap
curl -s http://10.0.10.5/scripts/bootstrap-ansible.sh | bash
%end

# Reboot after install
reboot --eject

Ubuntu Autoinstall Example (Essential Structure)

# /var/www/html/autoinstall/ubuntu/user-data
#cloud-config
autoinstall:
  version: 1
  identity: { hostname: changeme, username: deploy, password: "!" }
  ssh: { install-server: true, authorized-keys: ["ssh-ed25519 AAAA..."], allow-pw: false }
  network: { network: { version: 2, ethernets: { eno1: { dhcp4: true } } } }
  storage: { layout: { name: lvm, sizing-policy: all } }
  packages: [python3, curl, chrony]
  late-commands:
    - curtin in-target -- systemctl enable ssh
    - curtin in-target -- bash -c 'curl -s http://10.0.10.5/scripts/bootstrap-ansible.sh | bash'

Provisioning at Scale

MAAS vs Foreman vs Ironic

Feature MAAS (Canonical) Foreman + Katello OpenStack Ironic
Primary use case Bare-metal cloud Lifecycle management OpenStack bare-metal service
Discovery PXE enlistment PXE + smart proxy Inspector (PXE-based)
OS support Ubuntu (best), CentOS, RHEL RHEL, CentOS, Ubuntu, SLES Any (via deploy images)
Provisioning method Curtin + cloud-init Kickstart/Preseed + Puppet IPA (Ironic Python Agent)
Config management Cloud-init, Ansible (post) Puppet, Ansible, Salt None (external)
Network management VLAN, fabric, subnet, DHCP Smart proxy DHCP/DNS/TFTP Neutron integration
Dell integration IPMI power control BMC plugin + IPMI IPMI, Redfish drivers
RAID configuration Curtin storage config Partition tables RAID via IPA cleaning
Firmware management Not built-in katello content views Not built-in
API REST + CLI REST + Hammer CLI REST (OpenStack API)
Complexity Low-Medium Medium-High High (needs OpenStack)
Best for Ubuntu-first, cloud model RHEL-first, enterprise Already running OpenStack

End-to-End Provisioning Workflow

New server arrives
  |
  v
1. Rack & cable (power A+B, iDRAC dedicated NIC, data NICs)
  |
  v
2. iDRAC auto-discovers via DHCP or manual IP assignment
  |
  v
3. Automation configures iDRAC:
   - Set hostname, DNS, NTP
   - Create admin account, disable default root
   - Enable alerts (SNMP/email/syslog)
   - Apply golden BIOS config (see dell-server-management.md)
   - Configure RAID (OS mirror + data array)
  |
  v
4. Set one-time PXE boot + power cycle via Redfish
  |
  v
5. PXE boot -> automated OS install (kickstart/autoinstall)
  |
  v
6. Post-install callback triggers Ansible:
   - Base OS hardening (CIS benchmark)
   - Install monitoring agent (node_exporter, promtail)
   - Configure networking (bonds, VLANs, routes)
   - Install container runtime (containerd)
   - Join Kubernetes cluster (k3s/kubeadm)
  |
  v
7. Node appears in cluster, ready for workloads
  |
  v
8. Update CMDB/inventory database with:
   - Service tag, model, serial
   - Rack location (room/row/rack/U)
   - IP addresses (iDRAC, OS management, data)
   - Warranty expiration

Bootstrap Ansible Playbook (post-OS-install)

---
- name: Bootstrap newly provisioned server
  hosts: new_servers
  become: true

  vars:
    k3s_version: "v1.31.0+k3s1"
    k3s_server_url: "https://10.0.10.10:6443"
    k3s_token: "{{ vault_k3s_token }}"

  tasks:
    - name: Set hostname
      ansible.builtin.hostname:
        name: "{{ inventory_hostname }}"

    - name: Configure NTP
      ansible.builtin.copy:
        content: |
          server ntp.example.com iburst
          driftfile /var/lib/chrony/drift
          makestep 1.0 3
          rtcsync
        dest: /etc/chrony.conf
      notify: restart chrony

    - name: Install base packages
      ansible.builtin.package:
        name:
          - curl
          - jq
          - htop
          - iotop
          - sysstat
          - net-tools
          - lvm2
          - chrony
        state: present

    - name: Install node_exporter
      ansible.builtin.include_role:
        name: prometheus.prometheus.node_exporter

    - name: Install promtail for log shipping
      ansible.builtin.include_role:
        name: grafana.grafana.promtail

    - name: Join k3s cluster as agent
      ansible.builtin.shell: |
        curl -sfL https://get.k3s.io | \
          K3S_URL="{{ k3s_server_url }}" \
          K3S_TOKEN="{{ k3s_token }}" \
          INSTALL_K3S_VERSION="{{ k3s_version }}" \
          sh -
      args:
        creates: /usr/local/bin/k3s

  handlers:
    - name: restart chrony
      ansible.builtin.service:
        name: chronyd
        state: restarted

Server Decommissioning

Decommissioning Checklist

Server to be decommissioned: ________________
Service Tag: ________________
Rack Location: ________________

Pre-decommission:
  [ ] All workloads migrated/drained (kubectl drain)
  [ ] Node removed from cluster (kubectl delete node)
  [ ] Monitoring alerts silenced/removed
  [ ] DNS records removed
  [ ] DHCP reservation released
  [ ] Backup any local data if needed
  [ ] Document reason for decommission

Disk wipe (NIST 800-88):
  [ ] Clear: Single-pass overwrite (sufficient for non-classified)
      Command: shred -n 1 -z /dev/sdX
  [ ] Purge: Secure erase via firmware (for SSDs, use ATA Secure Erase)
      Command: hdparm --security-set-pass p /dev/sdX && hdparm --security-erase p /dev/sdX
  [ ] Destroy: Physical destruction (for classified data)
  [ ] Wipe method documented and signed off

iDRAC reset:
  [ ] Reset iDRAC to factory defaults
      racadm racresetcfg
  [ ] Clear Lifecycle Controller logs
      racadm lcl_dataclear

Physical:
  [ ] Disconnect power cables (both A and B feeds)
  [ ] Disconnect network cables
  [ ] Label cables before disconnecting (photo + written record)
  [ ] Remove from rack
  [ ] Remove asset tag / barcode
  [ ] Update rack elevation diagram

Inventory/CMDB:
  [ ] Update CMDB status to "Decommissioned"
  [ ] Record decommission date
  [ ] Update warranty tracking
  [ ] If recycling: record vendor handoff and certificate of destruction
  [ ] If remarketing: record buyer/destination

Sign-off:
  [ ] Decommission performed by: ________________
  [ ] Date: ________________
  [ ] Verified by: ________________

Automated Disk Wipe Script

#!/usr/bin/env bash
# disk-wipe.sh — NIST 800-88 Clear method (single-pass + zero)
# WARNING: This permanently destroys all data on the target disk.
set -euo pipefail

DISK="${1:?Usage: disk-wipe.sh /dev/sdX}"

if [[ ! -b "$DISK" ]]; then
    echo "Error: $DISK is not a block device" >&2
    exit 1
fi

# Safety: refuse to wipe if any partition is mounted
if mount | grep -q "^${DISK}"; then
    echo "Error: $DISK has mounted partitions. Unmount first." >&2
    exit 1
fi

echo "WARNING: About to wipe $DISK"
echo "Model: $(cat /sys/block/$(basename "$DISK")/device/model 2>/dev/null || echo unknown)"
echo "Size: $(lsblk -dno SIZE "$DISK")"
echo ""
read -rp "Type YES to proceed: " confirm
if [[ "$confirm" != "YES" ]]; then
    echo "Aborted."
    exit 1
fi

echo "Pass 1: Random overwrite..."
dd if=/dev/urandom of="$DISK" bs=4M status=progress 2>&1 || true

echo "Pass 2: Zero overwrite..."
dd if=/dev/zero of="$DISK" bs=4M status=progress 2>&1 || true

echo "Verifying (spot check first 1MB)..."
NONZERO=$(dd if="$DISK" bs=1M count=1 2>/dev/null | od -An -tx1 | grep -cv '^ 00')
if [[ "$NONZERO" -gt 0 ]]; then
    echo "WARNING: Non-zero bytes found in first 1MB. Wipe may be incomplete."
else
    echo "Verification passed: first 1MB is zeroed."
fi

echo "Wipe complete: $DISK"
echo "Date: $(date -u +%Y-%m-%dT%H:%M:%SZ)"
echo "Method: NIST 800-88 Clear (random + zero)"