Cisco ACI: APIC Certificate Bug (it’s a nasty time-waster)

My team ran into a nasty ACI bug (CSCva68310) that prevents you from adding nodes during setup to an ACI fabric. Here’s a quick write-up so that the next poor soul that spends WAY too much time struggling with fabric provisioning can hopefully get it fixed straightaway.

The Topology

The team unboxed a brand new trio of APIC-CLUSTER-L3 servers, ran the initial setup from the KVM, and connected them to a Nexus 93180-YC leaf. The 93180 was connected to a 9336 spine. Nothing complicated at all.

The Issue

No matter what the team did, the leaf and spine sat in “Inactive” mode and would not change state to Active during fabric discovery.

‘acidiag fnvread’ from the APIC CLI

From the GUI, here’s what it showed:

APIC GUI shows Status: Inactive

Finally, on the leaf, the output of ‘show discoveryissues’ was giving the error of “Registration to all PM shards is not complete” and “Policy download is not complete”.

Since we only had one APIC member online and hadn’t even discovered the fabric yet, how could it be a shard issue?

After completely wiping the fabric (APICs with ‘acidiag touch clean’, ‘acidiag touch setup’, ‘acidiag reboot’, and the spine/leaf with ‘setup-clean-config.sh’), the issue still persisted.

The Fault

After the fabric wipe, we saw that the fabric (out of the box) was throwing Fault Code F3031. The description of that fault code is:

  • “Failed to parse the subject line as a valid ACI fabric certificate AND Invalid Serial Number AND Invalid Product ID”

The issue with this situation is it’s not a self-signed certificate that’s the issue — it’s the Cisco Manufacturer Installed Certificate (MIC) that is put on the APIC at the factory. The only way to fix this is to call TAC and have them replace your MIC.

The problem certificate, installed at the factory. The Common Name is the serial number of that APIC.

From the bug notes:

  • Correct pattern: /serialNumber=PID: SN:/CN=
  • Incorrect Pattern:/CN=/serialNumber=PID: SN:

You can see your CN in the fault code at the bottom. Ours was obviously incorrect.

The Bug

Bug ID CSCva68310 matches this almost perfectly, however the bug ID says that you have to have your Fabric Authentication Policy set to Strict (instead of permissive). This was NOT the case for us — this bug was applicable out of the box with no security policy changes.

The Fix

Call Cisco TAC, have them reference the Bug ID, and tell them your need your MIC certificate replaced. You need to have the certificate replaced on ALL APICs that have this fault. Doing just on the primary APIC will not allow the remaining APICs to join the fabric.

Questions? Comments?

Please reach out if you have any questions or comments or need assistance with your ACI fabric (troubleshooting, analysis, audit, automation — my favorite). And Cisco, please fix the cert format.

Data center/security/collab hack, CCIE #5026, focusing on automation, programmability, operational efficiency and getting rid of technical debt.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store