Yes, you can have too many CPU cores - Ampere's 192-core chips break ARM64 Linux kernel in two-socket systems, company requests higher core count support

Lee Duna · 1 year ago

Yes, you can have too many CPU cores - Ampere's 192-core chips break ARM64 Linux kernel in two-socket systems, company requests higher core count support

@[email protected] · edit-2 1 year ago

What kind of “everyday” server stuff is efficiently making use of ≈300 cores? It’s clearly some set of tasks that can be done independently of one another, but do you know more specifically what kind of things people need this many cores on a server for?

Traditionally VMs would be the use case, but these days, at least in the Linux/cloud world, it’s mainly containers. Containers, and the whole ecosystem that is built around them (such as Kubernetes/OpenShift etc) simply eat up those cores, as they’re designed to scale horizontally and dynamically. See: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale

Normally, you’d run a cluster of multiple servers to host such workloads, but imagine if all those resources were available on one physical hosts - it’d be a lot more effecient, since at the very least, you’d be avoiding all that network overhead and delays. Of course, you’d still have at least a two node cluster for HA, but the efficiency of a high-end node still rules.

@[email protected] · 1 year ago

Normally, you’d run a cluster of multiple servers to host such workloads, but imagine if all those resources were available on one physical hosts - it’d be a lot more effecient, since at the very least, you’d be avoiding all that network overhead and delays.

Exactly! Imagine you have two services in a data center. If they have to communicate a lot with each other, then you would prefer them as close to each other as possible. Why? Well it’s because of the difference between sending a request over a network vs. just sending it to another process on the same host. It’s much more efficient in terms of latency and bandwidth. There are, of course, downsides and other other costs (like the fact that the cores that are handling the requests themselves are much less powerful), so you have to tailor your hardware allocation to your workloads. In general, if you’re CPU-bound, you would want more powerful CPUs (necessitating fewer cores per host for power reasons), and if you’re I/O bound, you want to reduce network latency as much as possible.

Now imagine you have thousands of services. The network I/O can get pretty extreme. Plus, occasionally, you have requirements like the fact that any data traveling from one host to another must be encrypted. So if you can keep as many services as possible on a single host, you reduce a lot of that overhead as well.

tl;dr: everything comes down to trade-offs and understanding the needs of your workloads, but in general, running 300 low power cores is probably indicative of an I/O-bound application and could hypothetically be much more efficient and cost-effective.