• @pivot_root
    link
    26
    edit-2
    4 months ago

    Moore’s Law is Dead shared an interesting video yesterday about these chips. Supposedly, leaks from his sources at Intel say that high voltages being pushed through the ring bus cause degradation. The leaks claim it shares the same power rail as the P and E cores, meaning it’s influenced by the voltage requested by the cores.

    For context, the ring bus is responsible for communication between cores, peripherals, and the platform. This includes memory accesses, which means that if the ring bus fails and does something incorrectly, it could appear normal but result in errors far down the line.

    Going beyond the video specifically, and considering what others have suggested as workarounds, it seems like ring bus degradation might be a decent candidate for the actual root cause of these issues.

    Some observations around chips degrading were:

    • High memory pressure exacerbates the issue.
    • Chips with more cores deteriorate faster.

    Some of the suggestions to work around the issue were:

    • Lower the memory speed.
    • Lower the voltage and clock speeds.
    • Disabling E cores.

    All of those can be related to stress being put on the ring bus:

    • Higher voltage being put through the bus -> higher likelihood of physical damage
    • More memory pressure -> more usage of the bus, more opportunity for damage to accumulate
    • More cores -> more memory pressure
    • Slower memory speeds -> less maximum throughput -> less stress

    I’m not claiming anything definitive, but I think my money is on this one.

    • KarnaOP
      link
      fedilink
      44 months ago

      Thanks for the additional details.

      The scariest part of this whole problem is there is no way for the owners of i13/14 CPU to figure out to what extent the CPU is damaged. It’s like holding a ticking bomb without knowing when that will go off!

      • @pivot_root
        link
        24 months ago

        100%. Whatever Intel does at this point, I don’t trust it to be a fix so much as a mitigation or attempt to delay the inevitable until a few years after the warranty period.

        If it’s possible for people to return their 13th/14th gen processor and trade up for a 12th gen, that would be the safest solution.

    • @[email protected]
      link
      fedilink
      English
      24 months ago

      I’ve heard speculation that this is exasperated by a feature where the CPU increases the voltage to boost clocks when running single core workloads at low temperatures. If that’s true, having less load or better cooling may be detrimental to the life of the processor.