Skip to content

Mellanox Switches

← Back to all decks

39 cards — 🟢 11 easy | 🟡 15 medium | 🔴 7 hard

🟢 Easy (11)

1. What company originally made Mellanox switches, and who owns them now?

Show answer Mellanox Technologies designed the switches. NVIDIA acquired Mellanox in 2020. The switch line continues under the "NVIDIA Networking" brand.

Fun fact: Mellanox acquired by NVIDIA (2020, $6.9B). Powers AI/HPC networking.

Number anchor: NVIDIA paid $6.9 billion for Mellanox in April 2020 — the largest acquisition in NVIDIA\'s history at the time.

2. What is the current name of the Mellanox switch operating system?

Show answer Onyx (formerly MLNX-OS). The name MLNX-OS still appears in firmware filenames.

Fun fact: Mellanox acquired by NVIDIA (2020, $6.9B). Powers AI/HPC networking.

Name origin: Onyx is a dark gemstone, fitting for networking firmware. The name change from MLNX-OS happened around 2018.

3. What is the Onyx command to save the running configuration to persistent storage?

Show answer configuration write — not "write memory" or "copy running-config startup-config" as in Cisco IOS.

Remember: ONYX = Mellanox switch OS. Cumulus = Linux-based NOS. Both on Spectrum ASICs.

Gotcha: Forgetting `configuration write` before reboot loses all changes. Unlike Cisco\'s `wr mem`, the command is `configuration write` — muscle memory from other vendors will fail you.

4. Name the four generations of the Spectrum ASIC family in order.

Show answer Spectrum (SN2000), Spectrum-2 (SN3000), Spectrum-3 (SN4000), Spectrum-4 (SN5000). Each roughly doubles bandwidth density.

Remember: Spectrum ASICs: -1(100G), -2(200G), -3(400G), -4(800G). Each gen ~doubles.

Timeline: Spectrum (2016), Spectrum-2 (2019), Spectrum-3 (2020), Spectrum-4 (2022). Each generation roughly doubles bandwidth density.

5. What maximum port speed does the Spectrum-3 (SN4000) ASIC support?

Show answer 400G per port.

Remember: Spectrum ASICs: -1(100G), -2(200G), -3(400G), -4(800G). Each gen ~doubles.

Under the hood: 400G is achieved using 8 lanes of 50G PAM4 signaling. Spectrum-4 reaches 800G with 8 lanes of 100G PAM4.

Analogy: Think of port speed generations like highway lanes — each generation widens the highway while also making each lane faster.

6. What does WJH stand for on Mellanox switches?

Show answer What Just Happened — a hardware-level feature that logs every packet dropped by the ASIC with the drop reason.

Fun fact: Mellanox acquired by NVIDIA (2020, $6.9B). Powers AI/HPC networking.

Under the hood: WJH uses dedicated ASIC hardware counters to log drops with zero performance impact. Traditional packet capture can\'t catch packets the ASIC drops.

Debug clue: `show what-just-happened forwarding` shows the most recent drops with reason codes — faster than tcpdump for ASIC-level drops.

7. What is UFM in the Mellanox ecosystem?

Show answer Unified Fabric Manager — a centralized management platform for multi-switch fabrics providing topology discovery, health monitoring, firmware orchestration, and telemetry.

Fun fact: Mellanox acquired by NVIDIA (2020, $6.9B). Powers AI/HPC networking.

Analogy: UFM is to Mellanox switches what vCenter is to VMware ESXi hosts — a single pane of glass for multi-device management.

8. What does MLAG stand for on Mellanox switches?

Show answer Multi-Chassis Link Aggregation — allows two physical switches to present as one logical switch for link aggregation to downstream devices.

Fun fact: Mellanox acquired by NVIDIA (2020, $6.9B). Powers AI/HPC networking.

Analogy: MLAG is like two switches pretending to be one for downstream devices. The downstream device sees a single LAG partner.

Gotcha: MLAG requires an ISL (Inter-Switch Link) between peers. If the ISL fails, split-brain occurs.

9. What Onyx command shows all LLDP neighbor information?

Show answer show lldp interfaces

Remember: ONYX = Mellanox switch OS. Cumulus = Linux-based NOS. Both on Spectrum ASICs.

Remember: LLDP = Link Layer Discovery Protocol. It\'s vendor-neutral, unlike Cisco\'s CDP. Both discover neighbors but LLDP works across vendors.

10. Which Mellanox switch model series is most commonly deployed as a ToR/leaf switch?

Show answer The SN2100 (16x100G) and SN2700 (32x100G) from the Spectrum (SN2000) family are the most common leaf/ToR choices.

Remember: Mellanox/NVIDIA = high-performance networking. InfiniBand for HPC, Spectrum for Ethernet.

Under the hood: Spectrum ASICs provide wire-speed L2/L3 with programmable pipeline.

Under the hood: SN2100 is popular as a leaf switch because 16x100G fits most rack sizes. SN2700 (32x100G) is used when higher density is needed.

11. What Onyx command shows hardware temperature, fan speed, and PSU status?

Show answer show environment

Remember: ONYX = Mellanox switch OS. Cumulus = Linux-based NOS. Both on Spectrum ASICs.

Remember: "show environment = health check." Check it after any maintenance window or airflow changes.

🟡 Medium (15)

1. What is RoCEv2 and why does it need lossless Ethernet?

Show answer RDMA over Converged Ethernet v2 — it carries RDMA traffic over standard Ethernet using UDP. Because RDMA assumes a reliable transport, any packet drop causes expensive retransmissions, so PFC is used to make the Ethernet fabric lossless.

Remember: RDMA = bypass CPU for low latency. RoCE = RDMA over Ethernet. v2 uses UDP.

Name origin: RDMA = Remote Direct Memory Access. RoCE = RDMA over Converged Ethernet. v2 added UDP encapsulation for L3 routing.

2. What is Priority Flow Control (PFC) and what priority is typically used for RoCE?

Show answer PFC allows pausing traffic on a specific 802.1p priority class without stopping other classes. Priority 3 is the conventional choice for RoCE traffic.

Remember: RDMA = bypass CPU for low latency. RoCE = RDMA over Ethernet. v2 uses UDP.

Analogy: PFC is like a traffic light that only stops one lane (priority class) while letting others flow freely.

3. What role does ECN play alongside PFC in a lossless fabric?

Show answer ECN (Explicit Congestion Notification) marks packets when buffer utilization crosses a threshold, signaling senders to reduce rate. This prevents buffers from filling to the point where PFC must pause traffic, avoiding PFC storms.

Remember: DCB = lossless Ethernet. PFC=flow control, ECN=congestion notification. For RoCE.

Remember: "ECN = early warning, PFC = emergency brake." ECN tells senders to slow down before buffers fill. PFC stops traffic when buffers are about to overflow.

4. What is DCBX and why does trust mode matter for RoCE?

Show answer Data Center Bridging Capability Exchange — negotiates PFC, ETS, and priority settings between switch and NIC. Trust mode (L2 CoS vs DSCP) must match on both sides, or RDMA traffic lands in the wrong queue and is not protected by PFC.

Remember: RDMA = bypass CPU for low latency. RoCE = RDMA over Ethernet. v2 uses UDP.

Name origin: DCBX = Data Center Bridging Capability eXchange. It auto-negotiates lossless Ethernet parameters between switch and NIC.

5. Describe the Onyx firmware upgrade workflow (key commands in order).

Show answer 1) image fetch scp://... to download image
2) show images to verify
3) image install <img> to install to next partition
4) image boot next to set boot partition
5) configuration write to save config
6) reload

Remember: ONYX = Mellanox switch OS. Cumulus = Linux-based NOS. Both on Spectrum ASICs.

Gotcha: Always verify the new firmware version supports your optics and ASIC generation before upgrading. Incompatible firmware can brick ports.

6. How does Onyx support firmware rollback?

Show answer Onyx has dual boot partitions. If a new firmware fails, you can select the previous partition at boot to roll back to the last known-good version.

Remember: ONYX = Mellanox switch OS. Cumulus = Linux-based NOS. Both on Spectrum ASICs.

Under the hood: Onyx uses an A/B partition scheme — firmware installs to the inactive partition. If boot fails, the bootloader falls back to the previous partition.

7. What is an ISL in MLAG and what happens if it fails?

Show answer The Inter-Switch Link connects MLAG peer switches. If the ISL fails, both switches become primary (split-brain), causing duplicate packets and MAC flapping downstream.

Remember: MLAG = two switches as one. Like LACP but across a switch pair.

Debug clue: If MLAG peers disagree on state, check `show mlag` and compare peer status. ISL failure is the #1 cause of MLAG split-brain.

8. Name three common WJH drop reasons and what they indicate.

Show answer BUFFER_CONGESTION: traffic burst exceeded buffer capacity. INGRESS_ACL: packet matched a deny ACL. BLACKHOLE_ROUTE: packet matched a null or missing route.

Remember: Mellanox/NVIDIA = high-performance networking. InfiniBand for HPC, Spectrum for Ethernet.

Under the hood: Spectrum ASICs provide wire-speed L2/L3 with programmable pipeline.

Debug clue: BUFFER_CONGESTION + no ECN marks = ECN is not configured. INGRESS_ACL drops = check `show access-list`. BLACKHOLE_ROUTE = check `show ip route`.

9. What DOM readings should you monitor on Mellanox switch transceivers?

Show answer Rx power (below -10 dBm is concerning), temperature (above 70C is a warning), and Tx bias current (increasing bias indicates optic degradation). Access via show interfaces ethernet 1/1 transceiver diagnostics.

Fun fact: Mellanox acquired by NVIDIA (2020, $6.9B). Powers AI/HPC networking.

Remember: "Rx power is king." Below -10 dBm means dirty fiber, bad splice, or failing optic. Above -1 dBm means the optic is overdriving the receiver.

10. How do you back up and restore configuration on an Onyx switch?

Show answer Backup: configuration upload scp://user@server/path.cfg
Restore: configuration fetch scp://... then configuration switch-to fetched-config.cfg then configuration write

Remember: ONYX = Mellanox switch OS. Cumulus = Linux-based NOS. Both on Spectrum ASICs.

Gotcha: Restoring a config from a different firmware version may fail silently. Always upgrade firmware first, then restore config.

11. How does the Spectrum shared buffer architecture differ from per-port buffers?

Show answer Spectrum uses a shared memory pool that any port can draw from, rather than fixed per-port allocations. This is better for bursty traffic but means one congested port can consume buffer space needed by other ports.

Remember: Spectrum ASICs: -1(100G), -2(200G), -3(400G), -4(800G). Each gen ~doubles.

Gotcha: One elephant flow can consume shared buffers meant for other ports. Use buffer management profiles to set per-port limits.

12. What is port breakout on a Mellanox switch and what is the oversubscription risk?

Show answer Breaking one high-speed port into multiple lower-speed ports (e.g., 1x100G into 4x25G). Total bandwidth does not increase — the existing bandwidth is split. With full breakout, effective oversubscription may increase.

Fun fact: Mellanox acquired by NVIDIA (2020, $6.9B). Powers AI/HPC networking.

Example: Breaking 1x100G into 4x25G is common for connecting 25G servers. Total bandwidth stays 100G — you\'re splitting, not multiplying.

13. How do you configure syslog forwarding on an Onyx switch?

Show answer logging host 10.0.0.100 port 514 and logging level info. Feed into a centralized logging system (ELK, Splunk, Loki).

Remember: ONYX = Mellanox switch OS. Cumulus = Linux-based NOS. Both on Spectrum ASICs.

Gotcha: Default syslog level may be too verbose for production. Start with `logging level warning` and increase if needed.

14. What is the PFC watchdog and why should you enable it?

Show answer PFC watchdog detects stuck PFC states (where a port is paused indefinitely). When detected, it disables PFC on that port to prevent a PFC storm from cascading across the fabric. Enable with priority-flow-control watchdog enable.

Remember: DCB = lossless Ethernet. PFC=flow control, ECN=congestion notification. For RoCE.

War story: Without PFC watchdog, a single misbehaving NIC can cascade PFC pauses across an entire datacenter fabric, causing minutes of downtime.

15. What features are included in the Mellanox/Onyx base license?

Show answer L2/L3 switching, BGP, OSPF, VXLAN, MLAG, RDMA/RoCE, WJH, REST API, and SNMP. The base license is more feature-rich than many competitors.

Fun fact: Mellanox acquired by NVIDIA (2020, $6.9B). Powers AI/HPC networking.

Fun fact: Unlike Cisco where BGP and OSPF often require separate licenses, Mellanox/NVIDIA includes L3 routing in the base license — a significant cost advantage.

🔴 Hard (7)

1. What three mechanisms must be configured together for a properly functioning RoCEv2 lossless fabric?

Show answer 1) PFC — to pause traffic before buffer overflow
2) ECN — to signal senders to back off before PFC activates
3) DCBX trust mode — to ensure traffic is classified into the correct priority queue on both switch and NIC

Remember: RDMA = bypass CPU for low latency. RoCE = RDMA over Ethernet. v2 uses UDP.

Name origin: RDMA = Remote Direct Memory Access. RoCE = RDMA over Converged Ethernet. v2 added UDP encapsulation for L3 routing.

Remember: "PFC + ECN + DCBX = the lossless trio." Miss any one and RoCE performance degrades unpredictably.

2. Explain how a PFC storm propagates through a datacenter fabric.

Show answer A slow receiver causes its ingress port to buffer. When the buffer fills, PFC sends pause frames upstream. The upstream switch port buffers, fills, and sends PFC pauses further upstream. This cascading pause can reach spine switches, pausing unrelated traffic across the entire fabric.

Remember: DCB = lossless Ethernet. PFC=flow control, ECN=congestion notification. For RoCE.

Debug clue: Check `show interfaces ethernet counters pfc` for pause frame counts. A port sending millions of pause frames per second is the storm source.

3. What is ISSU in the context of Mellanox MLAG, and what is the correct upgrade procedure?

Show answer In-Service Software Upgrade — upgrading firmware on one MLAG peer at a time without traffic disruption. Procedure: upgrade peer 1, verify MLAG reconverges and traffic is healthy, then upgrade peer 2. Both peers must end up on the same firmware version.

Fun fact: Mellanox acquired by NVIDIA (2020, $6.9B). Powers AI/HPC networking.

Remember: "ISSU = upgrade one peer at a time." Traffic flows through the surviving peer during each upgrade. Both peers must end on the same version.

4. What is the security risk of the Onyx REST API and how should it be secured in production?

Show answer The REST API provides full read-write access to switch configuration. If enabled without authentication (common in lab), it is a critical vulnerability. Secure with: HTTPS, strong authentication, management ACLs limiting source IPs, and regular access log auditing.

Remember: ONYX = Mellanox switch OS. Cumulus = Linux-based NOS. Both on Spectrum ASICs.

Gotcha: Some lab environments enable the REST API with default credentials. This is a critical vulnerability — any script can reconfigure the switch remotely.

5. Why might WJH show no drops even when packets are being lost, and how do you fix this?

Show answer WJH channels (forwarding, ACL) may not be enabled by default on some firmware versions. Enable with: what-just-happened forwarding enable and what-just-happened acl enable. Also enable auto-export.

Remember: Mellanox/NVIDIA = high-performance networking. InfiniBand for HPC, Spectrum for Ethernet.

Under the hood: Spectrum ASICs provide wire-speed L2/L3 with programmable pipeline.

Remember: "No WJH output ≠ no drops." Always verify WJH channels are enabled: `what-just-happened forwarding enable`.

6. What happens when a Spectrum ASIC exceeds its thermal threshold, and how do you prevent it?

Show answer The switch silently thermal-throttles, reducing port speeds to lower heat. Monitor via show environment and SNMP. Alert when ASIC temp exceeds 85C. Ensure proper hot/cold aisle separation and adequate rack airflow.

Remember: Spectrum ASICs: -1(100G), -2(200G), -3(400G), -4(800G). Each gen ~doubles.

Number anchor: Spectrum ASICs throttle around 105°C junction temperature. Alert at 85°C to catch airflow problems before performance degrades.

7. What can happen with non-qualified optics on Mellanox switches?

Show answer Non-qualified optics may be silently degraded: speed-limited, DOM unavailable, or intermittent CRC errors. The switch validates transceiver vendor/part against a qualified list. Some firmware allows overriding the check but this is unsupported.

Fun fact: Mellanox acquired by NVIDIA (2020, $6.9B). Powers AI/HPC networking.

War story: Using cheap third-party optics saved $500 per port but caused intermittent CRC errors under load, resulting in packet loss that was invisible to monitoring until WJH was enabled.