It takes time and efforts to debugging hardware and software to get a product right, but some bugs may be hard to reproduce, or only happen over time, and it appears some Intel Celeron C2000 series processor for microservers may stop working after about 18 months, with the likelihood of problems increasing over time, due to clock signals that stop functioning.
This is documented in Intel Atom Processor C2000 Product Family Specification Update, with Errata AVR 54 explaining the issue:
AVR54. System May Experience Inability to Boot or May Cease Operation
Problem: The SoC LPC_CLKOUT0 and/or LPC_CLKOUT1 signals (Low Pin Count bus clock
outputs) may stop functioning.
Implication: If the LPC clock(s) stop functioning the system will no longer be able to boot.
Workaround: A platform level change has been identified and may be implemented as a workaround
for this erratum.
Status: For the steppings affected, see Table 1, “Errata Summary Table” on page 9.
The table on page 9 shows stepping “B0” suffers from this problem. The issue affects existing motherboard and server based on Atom C2000, and companies like Cisco will provide replacements:
Recently, Cisco became aware of an issue related to a component manufactured by one supplier that affects some Cisco products. In some units, we have seen the clock signal component degrade over time. Although the Cisco products with this component are currently performing normally, we expect product failures to increase over the years, beginning after the unit has been in operation for approximately 18 months. Once the component has failed, the system will stop functioning, will not boot, and is not recoverable. This component is also used by other companies.
We have identified all Cisco products that have this component and worked with the supplier to quickly put a fix in place. All products shipping currently do not have this issue. To support our customers and partners, Cisco will proactively provide replacement products under warranty or covered by any valid services contract dated as of November 16, 2016, which have this component. Due to the age-based nature of the failure and the volume of replacements, we will be prioritizing orders based on the products’ time in operation.
The good news is that a new revision of the chip fixes the issue for new processors, but there’s no fix for older ones. So if you own any such systems, and they have stopped working or become unstable suddenly, it may be the reason. You also want to check if you can get a replacement while it is still under warranty whether it works or not.
Thanks to Mike for the tip.
Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.