Troubleshooting issues with Cisco IPMI Extensions

This chapter includes the following sections:

Introduction

Familiarity with the IPMI 2.0 specification is assumed in this chapter.

To offer greater ease of debugging, existing and future Cisco servers may offer more sensors than the 255 the current IPMI specification can handle. Thus, certain B-series and C-series Cisco servers extend the sensor and sensor related features of the IPMI 2.0 specification. This chapter describes these extensions so that IPMI tool users can use the extensions effectively.

For sensors whose sensor number is less than or equal to 255, Cisco remains compliant to IPMI specification. For sensors whose sensor number is greater than or equal to 256 (the extended sensor range or ESR), Cisco adds equivalent sensor-related IPMI commands as Cisco OEM IPMI commands. A sensor in the extended sensor range is referred to as a Cisco extended sensor (CES). Additionally, the IPMI specification does not constrain implementations to consecutively number all sensors. Thus, depending on the Cisco server and CIMC software version, there may be sensors in ESR that are not in the IPMI range even though there is room in the IPMI range.

The open source programs IPMITool and OpenIPMI will not be modified to integrate Cisco's ESR functionality. However, these tools provide the ability to issue raw IPMI commands and thus allow you to write a wrapper to process the ESR functionality.

Cisco ESR Details

Both SDR (sensor data record) and SEL formats are augmented to accommodate a 32-bit sensor ID beyond the existing IPMI standard 8-bit size. The Cisco Extended Sensor Range - Sensor Data Record (ESR-SDR) allows for more sensors, a larger record size and a longer string length. The ESR - Sensor Event Logs are enlarged and the ability to retrieve sensor readings in the new name is supported.

At the core, the operation of the CES and associated SDRs and SELs remain consistent with the behavior defined in the standard IPMI specification even though new formats are being introduced. In other words, the expected behavior of an IPMI sensor is no different than the expected behavior of a Cisco extended sensor.

The addition of the Cisco Extended SEL (ESR-SEL) repository operates slightly different than normally expected. To enhance overall debugging ease of the server, this repository contains both standard IPMI SEL events, reformatted to the ESR-SEL format, and SEL events of the Cisco Extended Sensors. In short, ESR-SEL can be regarded as the super set.

In a current UCS server, a sensor is provided to indicate, in percentage, the current usage of the IPMI SEL repositories and to generate a SEL event when the repository reaches a certain level of fullness. This sensor is named, "SEL_FULLNESS". Because ESR-SEL is a different repository, an equivalent sensor is provided. The traditional "SEL_FULLNESS" sensor always refers to the standard IPMI repository and the new "CISCO_SEL_FULLNESS" sensor refers to the ESR-SEL repository. The name of the sensor is the key in distinguishing between the repositories. Consequently, the "Clear-SEL-Event" SEL record and the "SEL-Full-Event" SEL record will also inherit the SEL usage sensor's name. By looking at the name of these sensors, one can determine the current percentage of usage, when the SEL was last cleared and when the SEL became full and to which repository those events refer.

When enabled, the UCS Manager SEL backup function will back up the SEL events from the server based on the SEL usage sensor. Prior to this implementation, this only sensor was the SEL_FULLNESS sensor. With ESR functionality, UCS Manager looks at both SEL usage sensors and backs up the events if either one reaches a certain usage level. UCS Manager will clear both repositories after the backup.

High Level Generic Algorithm

The following algorithm is provided to identify and use Cisco's Extended Sensors functionality. This algorithm can be safely applied to any Cisco and non-Cisco platform that implements IPMI.

  1. Issue a Get Device ID IPMI command to the server. If the manufacturer ID is 0x168B, this is a Cisco B-series or C-series server. Please proceed to the next step. If the manufacturing ID is not 0x168B, do not proceed on with this algorithm any further. It may lead to undefined behavior.
  2. Issue a Cisco Get ESR Capabilities IPMI command. Ensure that the first six bytes are as defined and all bytes are returned. If an error code is returned, a mismatch in the first six bytes or not all bytes are returned, then the Cisco B-Series or C-series server does not support ESR and do not proceed any further.
  3. Check the remaining bytes in the Cisco Get ESR Capabilities command. If the ESR enabled flag is not set, do not proceed any further.
  4. At this point, the Cisco B-series or C-series server is confirmed to support ESR and any of the Cisco ERS IPMI commands may be issued.

The following steps are recommended for retrieving sensor readings after support for ESR functionality has been established.

  1. Retrieve all standard IPMI SDRs per the IPMI specification.
  2. Retrieve all Cisco ESR Sensor Data Records (ESR-SDR) by issuing the Get ESR-SDR IPMI command.
  3. Retrieve the desired sensor reading by issuing the standard IPMI Get Sensor Reading command or by issuing the Get CES Reading IPMI command as described later in this document. The Get CES Reading command can also be issued to retrieve IPMI sensors.
  4. Use the corresponding SDR to decode the raw reading to a human readable format.

Byte Ordering

All multi-byte fields in IPMI are in little endian, meaning the least significant byte is placed in the least significant index of the request or response field. This is consistent with IPMI. For example, if there is a request field called earth-age and it is four bytes long at index 5 to 8. If the age of the earth is approximately 1 billion years, which is 0x3B9ACA00 in hexadecimal notation, then, index 5 of the request data is 0; index 6 should be 0xCA; index 7 is 0x9A and index 8 is 0x3B.

Cisco ESR IPMI Command Definitions

Get ESR Capabilities Command

Command Name:

Get ESR Capabilities

Net Function:

NF_STO

Command Number:

0xF5

Request Bytes

Field Name

Description

None

Response Bytes

Field Name

Description

[1]

Completion Code

[2]

ID0

0x43: ASCII 'C'

[3]

ID1

0x49: ASCII 'I'

[4]

ID2

0x53: ASCII 'S'

[5]

ID3

0x43: ASCII 'C'

[6]

ID4

0x4F: ASCII 'O'

[7]

Flags

Bit 0: This is the ESR enable flag. If this bit is set, ESR functionality is supported. Bits [7:1]: Reserved. All 0's.

[8]

API Version

The current version is 1. This field determines the definition of the request and response byte for all Cisco ESR IPMI Commands. This field can be used by software to determine which API version of Cisco ESR is on this system. For example, a document update, such as a typo, will result in a revision number change but the API format may not change. Thus, software does not have to change.

[9]

Document Version, Minor

This is the minor version of this document to use for reference.

[10]

Document Version, Major

This is the major version of this document to use for reference. Both the major and minor version indicates which version of this document to use for reference. The revision is on the title page of this document.

[11:37]

Reserved

Should be all zeros.

Get ESR-SDR Repository Information Command

This command gets the repository information and is analogous to the Get SDR Repository Info IPMI command.

Command Name:

Get ESR-SDR Repository Info

Net Function:

NF_STO

Command Number:

0xF0

Request Bytes

Field Name Description
None

Response Bytes

Field Name

Description

[1]

Completion Code

[2]

State

0x1 still in initialization.

[3:6]

Starting Record ID

The record ID that marks the first ESR-SDR in the repository.

[7:10]

End Record ID

The record ID that marks the last ESR-SDR in the repository.

[11:14]

SDR Size

The size of the ESR-SDR Repository in bytes.

Get Cisco SDR Record

Command Name:

Get Cisco SDR Record

Net Function

NF_STO

Command Number:

0xF1

Request Bytes

Field Name

Description

[1:4] Record ID The record ID of the SDR whose data is to be retrieved.
[5:6] Offset The offset into the record
[7] Read Bytes The number of bytes to read. Should not exceed 33 bytes.

Response Bytes

Field Name

Description

[1]

Completion Code

0xCA: Request read bytes and offset extends beyond the SDR's record length.

[2:5]

Next Record ID

The record id of the next SDR. 0xFFFFFFFF indicates the last record has been reached.

[6:N]

SDR Data.

Get ESR-SDR Command

This command retrieves the ESR-SDR records from the ESR-SDR Repository. Equivalent to the IPMI Get SDR command.

Command Name:

Get ESR-SDR

Net Function:

NF_STO

Command Number:

0xF1

Request Bytes

Field Name

Description

[1:4]

Record ID

The record ID of the SDR whose data is to be retrieved.

[5:6]

Offset

The offset into the record.

[7]

Read Bytes

The number of bytes to read. The maximum value is 33.

Response Bytes

Field Name

Description

[1]

Completion Code

[2:5]

Next Record ID

The record ID of the record that follows. The value, 0xFFFFFFFF, indicates the last record has been reached.

[6:N]

Data

The data of ESR-SDR record that is being retrieved.

Get CES Reading Command

This command retrieves the raw reading of the CES. Its equivalent command is the standard IPMI Get Sensor Reading command.

Command Name:

Get CES Reading

Net Function:

NF_SEN

Command Number:

0xF0

Request Bytes

Field Name

Description

[1:4]

CES number

The sensor number of the Cisco Extended Sensor

Response Bytes

Field Name

Description

[1]

Completion Code

[2]

Reading

See byte 2 of the standard IPMI Get Sensor Reading Command.

[3]

Sensor Status

See byte 3 of the standard IPMI Get Sensor Reading Command.

[4] Optional

Sensor Flags 1

See byte 4 of the standard IPMI Get Sensor Reading Command.

[5] Optional

Sensor Flags 2

See byte 5 of the standard IPMI Get Sensor Reading Command.

Get Cisco Extended SEL Repository Information

This command retrieves the raw reading regarding the SEL repository. Its equivalent command is the standard IPMI Get SEL Info command.

Command Name:

Get Cisco Extended SEL Repository Information

Net Function:

NF_STO

Command Number:

0xF2

Request Bytes

Field Name

Description

None

Response Bytes

Field Name

Description

[1]

Completion Code

[2:5]

Total Entries

Total Number of Cisco Extended SELs in the repository.

[6:9]

Free Space

Number of free bytes in the CESEL repository.

[10:13]

Add Timestamp

The timestamp of the latest Cisco SEL addition.

[14:17]

Erase Timestamp

The timestamp of the last Cisco SEL clear.

[18]

Flags

Bit 7: If set, SEL overflow occurred.

Bits[6:0]: Reserved. All zeros.

Get Cisco SEL Repository Info

Command Name:

Get Cisco SEL Repository Info

Net Function

NF_STO

Command Number:

0xF2

Request Bytes

Field Name

Description

None

Response Bytes

Field Name

Description

[1]

Completion Code

[2:5]

Total Entries

Number of Cisco SELs in repository

[6:9]

Free Space

Number of free bytes in repository

[10:13]

Add Timestamp

Last timestamp of Cisco SEL addition

[14:17]

Erase Timestamp

Last timestamp of Cisco SEL erase

[18]

SEL Flags

bit 7: If set, SEL overflow occurred.

Get Cisco Extended SEL Record Command

This command retrieves an entry from the SEL repository. Its equivalent command is the standard IPMI Get SEL Entry command.

Command Name:

Get Cisco Extended SEL Record Command

Net Function:

NF_SEN

Command Number:

0xF3

Request Bytes

Field Name

Description

[1:4]

Cisco SEL Record ID

The record ID of the Cisco Extended SEL to be retrieved.

Response Bytes

Field Name

Description

[1]

Completion Code

[2:5]

Next SEL Record ID

The record ID of the following SEL. The value, 0xFFFFFFFF, indicates the last SEL record has been reached.

[6:9]

SEL Record ID

The SEL ID that is being retrieved.

[10]

SEL Version

The version ID of the current SEL record. This field identifies how to interpret the remaining bytes in the SEL Record Data. Please see ESR-SEL record format section for more details.

[11:29]

SEL Record Data

The data of the Cisco Extended SEL.

Get Cisco SEL Entry

Command Name:

Get Cisco SEL Entry

Net Function:

NF_STO

Command Number:

0xF3

Request Bytes

Field Name

Description

[1:4]

SEL ID

SEL entry number to retrieve

Response Bytes

Field Name

Description

[1]

Completion Code

0xCA: SEL ID does not exist

[2:5]

Next SEL ID

SEL ID of the next SEL. 0xFFFFFFFF indicates the last SEL record has been reached.

[6:9]

SEL ID

The SEL ID that is being retrieved.

[10]

Version

Cisco SEL Format Version. Currently it is version 1 and thus the following bytes are defined for version 1.

[11]

SEL Type

See the equivalent in IPMI Get SEL Entry Response

[12:13]

Reserved

Should be zero.

[14:17]

Time stamp

Time stamp of SEL

[18:19]

Generator ID

See the equivalent in IPMI Get SEL Entry Response

[20]

EvMRev

See the equivalent in IPMI Get SEL Entry Response

[21]

Sensor Type

See the equivalent in IPMI Get SEL Entry Response

[22:25]

Sensor Number

[26]

Event Attribute

See the equivalent in IPMI Get SEL Entry Response

[27:29]

Event Data

See the equivalent in IPMI Get SEL Entry Response

Clear Cisco Extended SEL Repository

This command clears all existing SEL events in the repository. Equivalent to the IPMI Clear SEL command.

Command Name:

Clear Cisco Extended SEL Repository

Net Function:

NF_SEN

Command Number:

0xF4

Request Bytes

Field Name

Description

None

Response Bytes

Field Name

Description

[1]

Completion Code

Get Cisco Sensor Reading

Command Name:

Get Cisco Sensor Reading

Net Function

NF_SEN

Command Number:

0xF0

Request Bytes

Field Name

Description

[1:4]

Sensor Number

The number of the sensor to obtain reading.

Response Bytes

Field Name

Description

[1]

Completion Code

[2]

Reading

See byte 2 of IPMI Get Sensor Reading

[3]

Sensor Status

See byte 3 of IPMI Get Sensor Reading

[4] Optional

Sensor Flags

See byte 4 of IPMI Get Sensor Reading

[5] Optional

Sensor Flags

See byte 5 of IPMI Get Sensor Reading

Record Formats

SDR Format

Field Name

IPMI 2.0 SDR Type 1 Byte

Cisco SDR Byte

Description

Record ID

[1:2]

[1:4]

This will begin with the record ID of the last IPMI compliant SDR record plus one.

SDR Version 3 5

For Cisco SDR this will be 0x80.

Record Type 4 6

Will be fixed to 0xC1 for Cisco Sensor Full Data Record.

Record Length 5 [7:10]
Sensor Owner ID 6 11
Sensor Owner LUN 7 12
Sensor Number 8 [13:16]
Entity ID 9 17
Entity Instance 10 18
Sensor Initialization 11 19
Sensor Capabilities 12 20
Sensor Type 13 21
Event/Reading Code 14 22
Assertion Event Mask/Lower Threshold Reading Mask [15:16] [23:24]
Deassertion Event Mask/Upper Threshold Reading Mask [17:18] [25:26]
Discrete Reading Mask/Settable Threshold Mask/Readable Threshold Mask [19:20] [27:28]
Sensor Units 1 21 29
Sensor Units 2 22 30
Sensor Units 3 23 31
Linearization 24 32
M 25 33
M and Tolerance 26 34
B 27 35
B and Accuracy 28 36
Accuracy, Accuracy exponent and Sensor Direction 29 37
R and B exponents 30 38
Analog Characteristic Flag 31 39
Normal Reading 32 40
Normal Maximum 33 41
Normal Minimum 34 42
Sensor Max Reading 35 43
Senosr Min Reading 36 44
Upper Non-Recoverable Threshold 37 45
Upper Critical Threshold 38 46
Upper Non-Critical Threshold 39 47
Lower Non-Recoverable Threshold 40 48
Lower Critical Threshold 41 49
Lower Non-Critical Threshold 42 50
Positive Going Threshold Hystersis 43 51
Negative Going Threshold Hystersis 44 52
reserved [45:46] N/A Removed.
OEM 47 53
ID String and Len Code 48 54 Bits[7:6] is per IPMI spec. Bits[5:0] is the ID String Length.
ID String [49:64] [55:N] Now supports a maximum of 48 bytes. The maximum value for N is 102.

ESR-SEL Record Format

This section defines the format for the various standard IPMI SEL ranges in the ESR-SEL record format. The timestamp field, in general, indicates the number of seconds from epoch.

SEL Type 0x2 is the equivalent of the standard IPMI SEL Type 0x2 but with different indexes. The SEL record version and the SEL type field help identify this record type.

Byte Index

Field Name

Description

[1:4]

Record ID

The ID of this ESR-SEL record

[5]

Cisco SEL Record Version

Value is 0x1 for this definition.

[6]

SEL Type

Value is 0x2.

[7:8]

Reserved

Value is all 0.

[9:12]

Timestamp

Time stamp when the event was logged in the ESR-SEL repository.

[13:14]

Generator ID

Please refer to bytes 8 and 9 of the standard IPMI SEL Type 2.

[15]

EvMRev

Please refer to byte 10 of the standard IPMI SEL Type 2.

[16]

Sensor Type

Please refer to byte 11 of the standard IPMI SEL Type 2.

[17:20]

Sensor Number

The sensor number. This sensor number can be an IPMI sensor or a CES.

[21]

Event Attribute

Please refer to byte 13 of the standard IPMI SEL Type 2.

[22:24]

Event Data 1, 2 and 3

Please refer to byte 14 through 16 of the standard IPMI SEL Type 2.

SEL Type OEM Range 0xC0 to 0xDF functions as shown.

Byte Index

Field Name

Description

[1:4]

Record ID

The ID of this ESR-SEL record

[5]

Cisco SEL Record Version

Value is 0x1 for this definition.

[6]

SEL Type

Value: 0xC0 to 0xDF

[7:8]

Reserved

Value is all 0.

[9:12]

Timestamp

Time stamp when the event was logged in the ESR-SEL repository.

[13:15]

Manufacturer ID

Please refer to bytes 8 to 10 of the OEM SEL Record in the standard IPMI specification.

[16:22]

OEM Defined

Please refer to bytes 11 to 16 of the OEM SEL Record in the standard IPMI specification.

[23:24]

Reserved

Returns all 0s.

Under the IPMI specification, SEL Type OEM Range 0xE0 to 0xFF is a non-time-stamped OEM SEL record. However, when this event converts into the ESR-SEL record format, it will be time stamped.

Byte Index

Field Name

Description

[1:4]

Record ID

The ID of this ESR-SEL record

[5]

Cisco SEL Record Version

Value is 0x1 for this definition.

[6]

SEL Type

Value: 0xEO to 0xFF

[7]

OEM Defined Byte 1

Please refer to byte 4 of the OEM non-timestamped SEL event in the IPMI Specification.

[8]

Reserved

Value is 0.

[9:12]

Timestamp

Time stamp when the event was logged in the ESR-SEL repository.

[13:24]

OEM Defined Bytes 2 through 13

Please refer to bytes 5 to 16 of the OEM SEL Record format in the standard IPMI specification.

Recommended Solutions Based on IPMI Sensor Information

Overview

IPMI sensor information is available in the server event logs and in some error messages. This section presents some possible solutions for problems reported by IPMI sensors.

Power Sensors

Sensor Name

Recommended Action

P5V_STBY

P3V3_STBY

P1V1_SSB_STBY

P1V8_STBY

P1V0_STBY

P1V5_STBY

P0V75_STBY

P12V

P5V

P3V3

P1V5_SSB

P1V1_SSB

P1V8_SAS

P1V5_SAS

P1V0_SAS

P1V0A_SAS

P3V3_SAS

P12V_SAS

P0V75_SAS

P1V05_VTT_P1

P1V05_VTT_P2

P1V05_VTT_P3

P1V05_VTT_P4

P0V9_PVSA_P1

P0V9_PVSA_P2

P0V9_PVSA_P3

P0V9_PVSA_P4

P1V8_PLL_P1

P1V8_PLL_P2

P1V8_PLL_P3

P1V8_PLL_P4

P1V1_VCCP_P1

P1V1_VCCP_P2

P1V1_VCCP_P3

P1V1_VCCP_P4

P1V5_VCC_AB

P1V5_VCC_CD

P1V5_VCC_EF

P1V5_VCC_GH

P1V5_VCC_IJ

P1V5_VCC_KL

P1V5_VCC_MN

P1V5_VCC_OP

P0V75_DDR3VTT_AB

P0V75_DDR3VTT_CD

P0V75_DDR3VTT_EF

P0V75_DDR3VTT_GH

P0V75_DDR3VTT_IJ

P0V75_DDR3VTT_KL

P0V75_DDR3VTT_MN

P0V75_DDR3VTT_OP

If the status shown for the voltage to any of these sensors is FAIL or anything other than OK, the server needs to be returned to Cisco for a replacement. The CPU, DIMMs, and drives can be moved to the replacement server.

P3V_BAT_SCALED

Replace the motherboard battery if a failure is seen.

HP_MAIN_FET_FLT

HP_STBY_FET_FLT

HW_POWER_FLT

POWER_ON_FAIL

Failure of one of these sensors indicates a failure in the blade power supplies, the server will need to be replaced.

P12V_CUR_SENS

POWER_USAGE

If either of these sensors indicates a failure, reduce the load on the server. Check the power capping and budgeting options in UCS Manager.

VCCP_P1_CUR_SENS

VCCP_P2_CUR_SENS

VCCP_P3_CUR_SENS

VCCP_P4_CUR_SENS

PVSA_P1_CUR_SENS

PVSA_P2_CUR_SENS

PVSA_P3_CUR_SENS

PVSA_P4_CUR_SENS

VCCD_AB_CUR_SENS

VCCD_CD_CUR_SENS

VCCD_EF_CUR_SENS

VCCD_GH_CUR_SENS

VCCD_IJ_CUR_SENS

VCCD_KL_CUR_SENS

VCCD_MN_CUR_SENS

VCCD_OP_CUR_SENS

P1_CORE_VRHOT

P2_CORE_VRHOT

P3_CORE_VRHOT

P4_CORE_VRHOT

P1_MEM_VRHOT

P2_MEM_VRHOT

P3_MEM_VRHOT

P4_MEM_VRHOT

A failure on one or more of these sensors may be seen intermittently for CPU activity spikes. Reduce the CPU load if this is seen too often.

Device Detection Sensors

Sensor Name

Recommended Action

HDD0_PRS

HDD1_PRS

HDD2_PRS

HDD3_PRS

MEZZ1_PRS

MEZZ2_PRS

MLOM_PRS

TPM_CARD_PRS

P1_PRESENT

P2_PRESENT

P3_PRESENT

P4_PRESENT

DDR3_P1_A0_PRS

DDR3_P1_A1_PRS

DDR3_P1_A2_PRS

DDR3_P1_B0_PRS

DDR3_P1_B1_PRS

DDR3_P1_B2_PRS

DDR3_P1_C0_PRS

DDR3_P1_C1_PRS

DDR3_P1_C2_PRS

DDR3_P1_D0_PRS

DDR3_P1_D1_PRS

DDR3_P1_D2_PRS

DDR3_P2_E0_PRS

DDR3_P2_E1_PRS

DDR3_P2_E2_PRS

DDR3_P2_F0_PRS

DDR3_P2_F1_PRS

DDR3_P2_F2_PRS

DDR3_P2_G0_PRS

DDR3_P2_G1_PRS

DDR3_P2_G2_PRS

DDR3_P2_H0_PRS

DDR3_P2_H1_PRS

DDR3_P2_H2_PRS

DDR3_P3_I0_PRS

DDR3_P3_I1_PRS

DDR3_P3_I2_PRS

DDR3_P3_J0_PRS

DDR3_P3_J1_PRS

DDR3_P3_J2_PRS

DDR3_P3_K0_PRS

DDR3_P3_K1_PRS

DDR3_P3_K2_PRS

DDR3_P3_L0_PRS

DDR3_P3_L1_PRS

DDR3_P3_L2_PRS

DDR3_P4_M0_PRS

DDR3_P4_M1_PRS

DDR3_P4_M2_PRS

DDR3_P4_N0_PRS

DDR3_P4_N1_PRS

DDR3_P4_N2_PRS

DDR3_P4_O0_PRS

DDR3_P4_O1_PRS

DDR3_P4_O2_PRS

DDR3_P4_P0_PRS

DDR3_P4_P1_PRS

DDR3_P4_P2_PRS

MAIN_POWER_PRS

LSI_FLASH_PRSNT

BBU_PRES

All of these indicate the corresponding component was discovered successfully.

If an installed device fails discovery, try re-seating it in its socket, or replace it with a known working component of the same type.

POST Sensors

Sensor Name

Recommended Action

BIOS_POST_CMPLT

This sensor indicates BIOS POST has completed after the server powered up. Informational message, no further action is required.

BIOSPOST_TIMEOUT

POST took longer than expected and was unable to complete. Informational message, no further action is required.

BIST_FAIL

Indicates host CPU self test failure. Check the SEL to see which host CPU failed, and contact Cisco TAC. Replace the CPU.

WILL_BOOT_FAULT

The server will probably fail discovery, look for UCS Manager discovery problems.

Temperature Sensors

Sensor Name

Recommended Action

TEMP_SENS_FRONT

This is the intake temperature sensor. If this is too high, immediately verify that the ambient room temperature is within the desired range.

TEMP_SENS_REAR

This is the exhaust temperature sensor. If this is too high, verify that there are no obstructions to air intake or exhaust, and the air baffles in the server are installed as intended.

P1_TEMP_SENS

P2_TEMP_SENS

P3_TEMP_SENS

P4_TEMP_SENS

These sensors indicate overheating CPUs. The CPUs might not have correctly applied thermal paste, or the heat sink might be damaged or not tightened properly.

If these are still too high after replacing the thermal paste and checking the heat sink, also check that there are no obstructions to air intake or exhaust, and the air baffles in the server are installed as intended. If this condition has persisted too long you may need to replace the CPU.

DDR3_P1_A0_TMP

DDR3_P1_A1_TMP

DDR3_P1_A2_TMP

DDR3_P1_B0_TMP

DDR3_P1_B1_TMP

DDR3_P1_B2_TMP

DDR3_P1_C0_TMP

DDR3_P1_C1_TMP

DDR3_P1_C2_TMP

DDR3_P1_D0_TMP

DDR3_P1_D1_TMP

DDR3_P1_D2_TMP

DDR3_P2_E0_TMP

DDR3_P2_E1_TMP

DDR3_P2_E2_TMP

DDR3_P2_F0_TMP

DDR3_P2_F1_TMP

DDR3_P2_F2_TMP

DDR3_P2_G0_TMP

DDR3_P2_G1_TMP

DDR3_P2_G2_TMP

DDR3_P2_H0_TMP

DDR3_P2_H1_TMP

DDR3_P2_H2_TMP

DDR3_P3_I0_TMP

DDR3_P3_I1_TMP

DDR3_P3_I2_TMP

DDR3_P3_J0_TMP

DDR3_P3_J1_TMP

DDR3_P3_J2_TMP

DDR3_P3_K0_TMP

DDR3_P3_K1_TMP

DDR3_P3_K2_TMP

DDR3_P3_L0_TMP

DDR3_P3_L1_TMP

DDR3_P3_L2_TMP

DDR3_P4_M0_TMP

DDR3_P4_M1_TMP

DDR3_P4_M2_TMP

DDR3_P4_N0_TMP

DDR3_P4_N1_TMP

DDR3_P4_N2_TMP

DDR3_P4_O0_TMP

DDR3_P4_O1_TMP

DDR3_P4_O2_TMP

DDR3_P4_P0_TMP

DDR3_P4_P1_TMP

DDR3_P4_P2_TMP

These sensors indicate overheating DIMMs. Check that there are no obstructions to air intake or exhaust, and the air baffles in the server are installed as intended.

If the problem persists, the overheating DIMMs may become damaged and need to be replaced.

P1_PROCHOT

P2_PROCHOT

P3_PROCHOT

P4_PROCHOT

These sensors indicate overheating CPUs. The CPUs might not have correctly applied thermal paste, or the heat sink might be damaged or not tightened properly.

If these are still too high after replacing the thermal paste and checking the heat sink, also check that there are no obstructions to air intake or exhaust, and the air baffles in the server are installed as intended.

If the problem persists, you may need to replace the CPU.

This sensor also indicates the Intel Processor is trying to self-regulate its temperature by slowing its internal clock, which lowers its power draw and the heat it generates.

P1_THERMTRIP_N

P2_THERMTRIP_N

P3_THERMTRIP_N

P4_THERMTRIP_N

These sensors indicate overheating CPUs. The CPUs might not have correctly applied thermal paste, or the heat sink might be damaged or not tightened properly.

If these are still too high after replacing the thermal paste and checking the heat sink, also check that there are no obstructions to air intake or exhaust, and the air baffles in the server are installed as intended.

If the problem persists, you may need to replace the CPU.

This sensor indicates the Intel Processor is trying to self-regulate its temperature and prevent overheating damage by shutting down. Most likely this is seen after the processor has tried to self-regulate its temperature by slowing its internal clock, which lowers its power draw and the heat it generates.

Supercap Sensors

Sensor Name

Recommended Action

LSI_SCAP_FAULT

This sensor indicates the supercap needs to be replaced.

BBU_PRES

This sensor indicates the presence of a supercap. Informational, no action is required.

BBU_TEMP

This sensor reports temperature in degrees C of the supercap. Informational, no action is required unless overheating is indicated. If the supercap is overheating, power down the server.

BBU_PRED_FAIL

This sensor indicates the supercap is about to fail and should be replaced.

BBU_FAULT

BBU_REPLACE_REQD

A failure has occurred in the supercap, replace the supercap immediately.

BBU_DEGRADED

The supercap needs attention. The LSI firmware will take care of this automatically and no action is required.

BBU_CAPACITANCE

Measures and reports the supercap charge state in % of design value.

Standard IPMI Sensors

Sensor Name

Recommended Action

SEL_FULLNESS

Percentage full of the standard IPMI sensor log. No action is required, this is informational only.

CISCO_SEL_FULLNESS

Percentage full of the Cisco extended sensor log. No action is required, this is informational only.

Preventing Problems With IPMI Settings After Downgrade

Problem—IPMI settings fail.

Possible Cause—By default, IPMI over LAN is disabled in CIMC version at 2.2(2*) and above. If the system is downgraded to 2.2(1d), for example, IPMI over LAN is still disabled.

To prevent problems that sometimes occur after downgrading, follow the steps in this section before the downgrade to enable IPMI over LAN in Cisco UCS Manager: http://www.cisco.com/web/about/security/intelligence/IPMI_security.html#host.