Skip to content

Quiz: Mellanox Switches

← Back to quiz index

10 questions

L1 (6 questions)

1. What is the Onyx CLI command to save the running configuration to persistent storage?

Show answer `configuration write`. Unlike Cisco IOS (`write memory` or `copy running-config startup-config`), Onyx uses its own command syntax. *Common mistake:* Many people try `write memory` or `copy run start` from Cisco muscle memory, which will fail on Onyx.

2. What does WJH (What Just Happened) do on a Mellanox switch?

Show answer WJH logs every packet dropped in hardware with the exact drop reason (e.g., ACL deny, buffer congestion, TTL expired, blackhole route). It is the single most useful debugging tool on the platform. *Common mistake:* WJH is not a packet capture tool — it only records drops, not successfully forwarded packets.

3. Name the four generations of the Mellanox Spectrum ASIC family and their maximum port speeds.

Show answer Spectrum (SN2000) — 100G. Spectrum-2 (SN3000) — 200G. Spectrum-3 (SN4000) — 400G. Spectrum-4 (SN5000) — 800G. Each generation roughly doubles bandwidth density.

4. What is the MLAG split-brain problem on Mellanox switches and what causes it?

Show answer When the ISL (Inter-Switch Link) between MLAG peers fails, both switches become primary and independently forward traffic, causing duplicate packets and MAC flapping downstream. Fix by using redundant ISL links and a management-network heartbeat backup.

5. How does Onyx firmware support rollback after a failed upgrade?

Show answer Onyx has dual boot partitions. After installing new firmware to the next partition and rebooting, if the new version fails, you can select the previous partition at boot to roll back to the last known-good version.

6. What is UFM in the Mellanox/NVIDIA Networking ecosystem?

Show answer Unified Fabric Manager — a centralized management platform that provides topology discovery, health monitoring, firmware orchestration, and telemetry aggregation across multi-switch fabrics. It runs as a separate appliance or VM, not on the switches themselves.

L2 (4 questions)

1. Why must DCBX trust mode match between the switch and NIC for RoCEv2 to work correctly?

Show answer If the switch trusts L2 CoS bits but the NIC marks DSCP (or vice versa), RDMA traffic is classified into the wrong priority queue. PFC does not protect it, causing drops under load and degraded RDMA performance. *Common mistake:* Some assume DCBX negotiation auto-resolves mismatches, but in practice a trust mode mismatch silently misclassifies traffic.

2. What is a PFC storm and how does the PFC watchdog prevent it?

Show answer A PFC storm occurs when a slow receiver causes PFC pause frames to cascade upstream through the fabric, eventually pausing unrelated traffic on spine switches. The PFC watchdog detects stuck PFC states and disables PFC on the affected port to break the cascade. *Common mistake:* PFC storms are often confused with broadcast storms, but they are a priority-based flow control issue specific to lossless Ethernet fabrics.

3. What three mechanisms must be configured together for a properly functioning RoCEv2 lossless fabric on Mellanox switches?

Show answer 1. PFC (Priority Flow Control) — to pause traffic before buffer overflow.
2. ECN (Explicit Congestion Notification) — to signal senders to back off before PFC activates.
3. DCBX trust mode — to ensure traffic is classified into the correct priority queue on both switch and NIC.

4. Why might show what-just-happened return no results even when packets are being dropped?

Show answer WJH channels (forwarding, ACL) may not be enabled by default on some firmware versions. You must explicitly enable them with `what-just-happened forwarding enable` and `what-just-happened acl enable`. *Common mistake:* Some assume WJH is always active because it is a hardware feature, but the reporting channels must be explicitly enabled in software.