Troubleshooting of Infrastructure Alarms

This chapter provides a description, severity, and troubleshooting procedure for each commonly encountered Cisco NCS 1010 infrastructure alarm and condition. When an alarm is raised, refer to its clearing procedure.

LICENSE-COMM-FAIL

Default Severity: Major(MJ), Non-Service-Affecting (NSA)

Logical Object: plat_sl_client

The LICENSE-COMM-FAIL alarm is raised when the device is not able to communicate with the Cisco license cloud server.

Clear LICENSE-COMM-FAIL Alarm

Procedure


This alarm is cleared when the communication with the Cisco cloud license server is restored.

If the alarm does not clear, contact your Cisco account representative or log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


DISASTER_RECOVERY_UNAVAILABLE_ALARM

Default Severity: Major(MJ), Non-Service-Affecting (NSA)

Logical Object: Instorch

The DISASTER_RECOVERY_UNAVAILABLE_ALARM is triggered when the chassis SSD image is corrupted or the system operates with uncommitted software.

Clear the Disaster Recovery Unavailable Alarm

Procedure


This alarm clears automatically after the upgrade from a lower release to a higher release. The upgrade process completes after running the install commit command. It syncs the image with the local repository every 12 hours. For more details about software upgrade, see the Upgrade Software section of the Cisco NCS 1010 System Setup and Software Installation Guide.

If the alarm does not clear, contact your Cisco account representative or log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


ESD_INIT_ERR_E

Default Severity: Critical (CR), Service-Affecting (SA)

Logical Object: ESD

The ESD_INIT_ERR_E alarm is raised when the Ethernet Switch Driver (ESD) initialization fails.

Clear the ESD_INIT_ERR_E Alarm

Procedure


Cisco IOS XR automatically detects and clears this alarm by resetting the switch.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


PORT_AUTO_TUNE_ERR_E

Default Severity: Major (MJ), Service-Affecting (SA)

Logical Object: ESD

The PORT_AUTO_TUNE_ERR_E alarm is raised when the port auto-tuning fails.

Clear the PORT_AUTO_TUNE_ERR_E Alarm

Procedure


Cisco IOS XR automatically detects and clears this alarm by resetting the port.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


PORT_INIT_ERR_E

Default Severity: Major (MJ), Service-Affecting (SA)

Logical Object: ESD

The PORT_INIT_ERR_E alarm is raised when the port initialization fails.

Clear the PORT_INIT_ERR_E Alarm

Procedure


Cisco IOS XR automatically detects and clears this alarm by resetting the port.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


SPI_FLASH_CFG_INIT_ERR_E

Default Severity: Critical (CR), Service-Affecting (SA)

Logical Object: ESD

The SPI_FLASH_CFG_INIT_ERR_E alarm is raised when there is an unsupported switch firmware version present.

Clear the SPI_FLASH_CFG_INIT_ERR_E Alarm

Procedure


Cisco IOS XR automatically detects and clears this alarm by resetting the Aldrin. If the alarm does not clear automatically:

  • Restart the ESD process using the process restart esd location 0/rp0/cpu0 command.

  • Reload the rack using the reload location 0/rack command.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


SWITCH_ALL_PORTS_DOWN_ERR_E

Default Severity: Critical (CR), Service-Affecting (SA)

Logical Object: ESD

The SWITCH_ALL_PORTS_DOWN_ERR_E alarm is raised when all the switch ports are down.

Clear the SWITCH_ALL_PORTS_DOWN_ERR_E Alarm

Procedure


Cisco IOS XR automatically detects and clears this alarm by resetting the ports.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


SWITCH_CFG_INIT_ERR_E

Default Severity: Critical (CR), Service-Affecting (SA)

Logical Object: ESD

The SWITCH_CFG_INIT_ERR_E alarm is raised when the initial switch configuration fails.

Clear the SWITCH_CFG_INIT_ERR_E Alarm

Procedure


Cisco IOS XR automatically detects and clears this alarm by resetting the switch.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


SWITCH_CRITICAL_PORT_FAILED_E

Default Severity: Critical (CR), Service-Affecting (SA)

Logical Object: ESD

The SWITCH_CRITICAL_PORT_FAILED_E alarm is raised when there is a critical port failure.

Clear the SWITCH_CRITICAL_PORT_FAILED_E Alarm

Procedure


Cisco IOS XR automatically detects and clears this alarm by resetting the Aldrin. If the alarm does not clear automatically:

  • Restart the ESD process using the process restart esd location 0/rp0/cpu0 command.

  • Reload the rack using the reload location 0/rack command.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


SWITCH_DMA_ERR_E

Default Severity: Critical (CR), Service-Affecting (SA)

Logical Object: ESD

The SWITCH_DMA_ERR_E alarm is raised when the switch Direct Memory Access (DMA) engine fails.

Clear the SWITCH_DMA_ERR_E Alarm

Procedure


Cisco IOS XR automatically detects and clears this alarm by resetting the switch.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


SWITCH_EEPROM_INIT_ERR_E

Default Severity: Critical (CR), Service-Affecting (SA)

Logical Object: ESD

The SWITCH_EEPROM_INIT_ERR_E alarm is raised when the Switch EEPROM initialization fails.

Clear the SWITCH_EEPROM_INIT_ERR_E Alarm

Procedure


Cisco IOS XR automatically detects and clears this alarm by resetting the switch.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


SWITCH_FDB_ERR_E

Default Severity: Critical (CR), Service-Affecting (SA)

Logical Object: ESD

The SWITCH_FDB_ERR_E alarm is raised when the switch forwarding database (FDB) operation fails.

Clear the SWITCH_FDB_ERR_E Alarm

Procedure


Cisco IOS XR automatically detects and clears this alarm by resetting the switch.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


SWITCH_FDB_MAC_ADD_ERR_E

Default Severity: Critical (CR), Service-Affecting (SA)

Logical Object: ESD

The SWITCH_FDB_MAC_ADD_ERR_E alarm is raised when the switch firmware is unable to add a MAC address to its database.

SWITCH_FIRMWARE_BOOT_FAIL_E

Default Severity: Critical (CR), Non-Service-Affecting (NSA)

Logical Object: ESD

The SWITCH_FIRMWARE_BOOT_FAIL_E alarm is raised when the switch firmware boot fails.

Clear the SWITCH_FIRMWARE_BOOT_FAIL_E Alarm

Procedure


This alarm can be cleared when the ESD auto clears the alarm by resetting the switch.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


SWITCH_NOT_DISCOVERED_E

Default Severity: Critical (CR), Service-Affecting (SA)

Logical Object: ESD

The SWITCH_NOT_DISCOVERED_E alarm is raised when the switch is not discovered on the Peripheral Component Interconnect express (PCIe) bus.

Clear the SWITCH_NOT_DISCOVERED_E Alarm

Procedure


Cisco IOS XR automatically detects and clears this alarm by resetting the switch.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


SWITCH_RESET_RECOVERY_FAILED_E

Default Severity: Critical (CR), Service-Affecting (SA)

Logical Object: ESD

The SWITCH_RESET_RECOVERY_FAILED_E alarm is raised when the Switch Reset operation does not recover the switch.

Clear the SWITCH_RESET_RECOVERY_FAILED_E Alarm

Procedure


Cisco IOS XR automatically detects and clears this alarm by reloading the card using the reload cpu0/rp0 command.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


UNSTABLE_LINK_E

Default Severity: Major (MJ), Service-Affecting (SA)

Logical Object: ESD

The UNSTABLE_LINK_E alarm is raised when there is an unstable link with high number of UP and DOWN state changes.

Clear the UNSTABLE_LINK_E Alarm

Procedure


Cisco IOS XR automatically detects and clears this alarm by resetting the port.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


FAN FAIL

Default Severity: Major (MJ), Service-Affecting (SA)

Logical Object: SPI-ENVMON

The FAN FAIL alarm is raised when one of the two fans stops spinning or fails. If a fan stops working properly, the temperature can increase beyond the usual operating range, which might also trigger the TEMPERATURE alarm to activate.

Clear the FAN FAIL Alarm

Procedure


To clear this alarm, replace the faulty fan in the chassis.

If the alarm does not clear after replacing the faulty fan, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


FAN SPEED SENSOR 0: OUT OF TOLERANCE FAULT

Default Severity: Major (MJ), Service-Affecting (SA)

Logical Object: SPI-ENVMON

The FAN SPEED SENSOR 0: OUT OF TOLERANCE FAULT alarm is raised when one or more fans in the fan tray are faulty.

Clear the FAN SPEED SENSOR 0: OUT OF TOLERANCE FAULT Alarm

Procedure


To clear this alarm, replace the faulty fans in the chassis.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


FAN-POWER-ERROR

Default Severity: Major (MJ), Non-Service-Affecting (NSA)

Logical Object: SPI-ENVMON

The FAN-POWER-ERROR alarm is raised when the power supply to the fan tray fails.

Clear the FAN-POWER-ERROR Alarm

Procedure


This alarm is cleared when:

  • The power supply to the fan tray is restored.

  • Online Insertion and Removal (OIR) of the fan tray is performed.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


FAN-TRAY-ABSENT

Default Severity: Major (MJ), Non-Service-Affecting (NSA)

Logical Object: SPI-ENVMON

The FAN-TRAY-ABSENT alarm is raised when one or more fan trays are absent or removed from the chassis.

Clear the FAN-TRAY-REMOVAL Alarm

Procedure


Insert the fan trays into the chassis.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


FPD IN NEED UPGD

Default Severity: Major (MJ), Service-Affecting (SA)

Logical Object: SPI-FPD

The FPD IN NEED UPGD alarm is raised when a newer FPD version in the FPD package is available on the FPD boot disk and the its internal memory has an outdated FPD version. A FPD package is stored on the boot disk and contains all the FPD images for each FPD on the platform for that Cisco IOS XR version. The FPDs run from images stored in its internal memory and not from the images inside the FPD package.

Clear the FPD IN NEED UPGD Alarm

Procedure


This alarm is cleared when the correct FPD is upgraded using the upgrade hw-module location location-id fpd fpd name command. For more details, see the Upgrade FPDs Manually section of the Cisco NCS 1010 System Setup and Software Installation Guide.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


INSTALL IN PROGRESS

Default Severity: Major (MJ), Service-Affecting (SA)

Logical Object: SPI-INSTALL

The INSTALL IN PROGRESS alarm is raised when the install operation is in progress or if the "install commit" is not performed after activating a new image or package.

Clear the INSTALL IN PROGRESS Alarm

Procedure


Step 1

1) Wait until the install operation is completed.

Step 2

2) Run the install commit command after the install activate command.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


NODE-UNPAIRED-FROM-BAND-PARTNER NODE Alarm

Default severity: Not Alarmed (NA), Non-Service-Affecting (NSA)

Logical Object: OTS Controller

The NODE-UNPAIRED-FROM-AND-PARTNER-NODE alarm is raised when:

  • The interlink management port is shut, and cable between C and L band is disconnected.

  • The partner band OLC configuration is removed from one end after the bidirectional connection is established, causing the connection to break in one of the directions.

  • The partner-band node is unavailable due to RP reload or power cycle events.

Clear NODE-UNPAIRED-FROM-BAND-PARTNER-NODE Alarm

This alarm gets cleared when:

  • The cable between C and L band is connected and the interlink management port is brought up.

  • The OLC partner band configuration is removed from the alarmed node.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).

OPTICAL-MOD-ABSENT

Default Severity: Critical (CR), Service-Affecting (SA)

Logical Object: Phy1_mgmt

The Optical-Mod-Absent alarm is raised when:

  • line card is not inserted properly or is removed from the chassis.

  • Line card cold reload is performed.

Clear the Optical MOD Absent Alarm

To clear this alarm, perform the following steps:

SUMMARY STEPS

  1. Follow the procedure Remove and Replace Line Card to reinsert the line card and connect the fan.
  2. The alarm clears automatically once the LC reload is complete.

DETAILED STEPS


Step 1

Follow the procedure Remove and Replace Line Card to reinsert the line card and connect the fan.

Step 2

The alarm clears automatically once the LC reload is complete.


If the alarm does not clear, contact your Cisco account representative or log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).

OUT_OF_COMPLIANCE

Default Severity: Critical (CR), Service-Affecting (SA)

Logical Object: plat_sl_client

The OUT_OF_COMPLIANCE alarm is raised when one or more license entitlements is not in compliance. This state is seen when the license does not have an available license in the corresponding Virtual Account that the Cisco device is registered to, in the Cisco Smart Account.

Clear Out of Compliance Alarm

SUMMARY STEPS

  1. To clear this alarm, enter into a compliance by adding the correct number and type of licenses to the Smart Account.

DETAILED STEPS


To clear this alarm, enter into a compliance by adding the correct number and type of licenses to the Smart Account.

If the alarm does not clear, contact your Cisco account representative or log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


PID-MISMATCH

Default Severity: Major (MJ), Service-Affecting (SA)

Logical Object: SPI-ENVMON

The PID-MISMATCH alarm is raised when one AC and one DC PSU are connected.

Clear the PID-MISMATCH Alarm

Procedure


To clear this alarm, ensure that both connected PSU’s are either AC or DC.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


POWER MODULE OUTPUT DISABLED

Default Severity: Major (MJ), Service-Affecting (SA)

Logical Object: SPI-ENVMON

The POWER MODULE OUTPUT DISABLED alarm is raised power supply is not connected to the power module.

Clear the POWER MODULE OUTPUT DISABLED Alarm

Procedure


This alarm is automatically cleared when power supply is connected to the power module.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


POWER-MODULE-REDUNDANCY-LOST

Default Severity: Major (MJ), Service-Affecting (SA)

Logical Object: SPI-ENVMON

The Power Group redundancy lost (POWER-MODULE-REDUNDANCY-LOST) alarm is raised if:

  • the Power Supply Unit (PSU) is faulty or removed.

  • the input PSU voltage goes beyond the working range of 180 to 264 volts for input high line (HL) and 90 to 140 volts for input low line (LL) nominal voltages.

Clear the POWER-MODULE-REDUNDANCY-LOST Alarm

Procedure


To clear this alarm:

  • Re-insert the power module and then connect the power supply to the module.

  • If the alarm does not clear after re-inserting, replace the power module.

  • Check the input voltage value of the PSU using the show environment power command.

  • If the input voltage is beyond the working range, check the power supplied to the PSU.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


SIA_GRACE_PERIOD_REMAINING

Default Severity: Minor (MN), Non-Service-Affecting (NSA)

Logical Object: plat_sl_client

When the device enters an Out-of-Compliance (OOC) state, a grace period of 90 days begins. During this period, SIA license benefits can still be availed. The SIA_GRACE_PERIOD_REMAINING alarm is raised when a Software Innovation Access(SIA) upgrade is allowed during this grace period.

Clear SIA Grace Period Remaining

SUMMARY STEPS

  1. This alarm is cleared when Software Innovation Access(SIA) licenses are purchased.

DETAILED STEPS


This alarm is cleared when Software Innovation Access(SIA) licenses are purchased.

If the alarm does not clear, contact your Cisco account representative or log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


SIA_UPGRADE_BLOCKED

Default Severity: Major(MJ), Service-Affecting (SA)

Logical Object: plat_sl_client

The SIA_UPGRADE_BLOCKED alarm is raised when Software Innovation Access(SIA) grace period has expired.

Clear SIA Grace Period Remaining

Procedure


This alarm is cleared when the SIA licences are purchased.

If the alarm does not clear, contact your Cisco account representative or log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


TACACS Server Down

Default Severity: Major (MJ), Service-Affecting (SA)

Logical Object: Software

Release: This alarm was introduced in R24.3.1.

The <TACACS server details>TACACS server DOWN alarm is raised when the TACACS server is down.

Clear the TACACS Server Down Alarm

The TACACS Server Down alarm is cleared when the TACACS server is up.

To clear this alarm, perform these steps:

Procedure


Step 1

Verify the TACACS server group using the show tacacs server-groups command.

Step 2

Check the connection between the device and the TACACS+ server using the show tacacs details command.

Step 3

Remove the global and private server configuration.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


TEMPERATURE

Default Severity: Minor (MN), Non-Service-Affecting (NSA)

Logical Object: SPI-ENVMON

The TEMPERATURE alarm is raised when the temperature of a sensor exceeds the normal operating range because of any of the following reasons:

  • One or more fans stops working.

  • Inadequate airflow.

  • Environmental temperature of the room is abnormally high.

Clear the TEMPERATURE Alarm

Procedure

To clear this alarms:


To clear this alarms:

Step 1

Check the fan speed and temperature values using the show environment command.

Step 2

Check any fan tray or failure alarms using the show alarms brief system active.

Step 3

Ensure that:

  1. there are no airflow obstructions.

  2. fans are working fine.

  3. environmental temperature of the room is not abnormally high.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).


UPGRADE_LICENSE_GRACE_PERIOD_REMAINING

Default Severity: Minor (MN), Non-Service-Affecting (NSA)

Logical Object: plat_sl_client

The UPGRADE_LICENSE_GRACE_PERIOD_REMAINING alarm is raised when a software upgrade is allowed in the upgrade license grace period.

UPGRADE_LICENSE_GRACE_PERIOD_REMAINING

Default Severity: Minor (MN), Non-Service-Affecting (NSA)

Logical Object: plat_sl_client

The UPGRADE_LICENSE_GRACE_PERIOD_REMAINING alarm is raised when a software upgrade is allowed in the upgrade license grace period.

VOLTAGE

Default Severity: Minor (MN), Major (MJ), Critical (CR), Non-Service-Affecting (NSA)

Logical Object: SPI-ENVMON

The VOLTAGE alarm is raised when the voltage is out of the operating range.

Clear the VOLTAGE Alarm

Procedure

To clear this alarm:


To clear this alarm:

Step 1

Check if the input voltage is within the expected range.

Step 2

Check the component level voltage is within the operating range using the show environment voltage command.

If the alarm does not clear, log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).