Yes, you can have too many CPU cores - Ampere's 192-core chips break ARM64 Linux kernel in two-socket systems, company requests higher core count support

Lee Duna · 1 year ago

Yes, you can have too many CPU cores - Ampere's 192-core chips break ARM64 Linux kernel in two-socket systems, company requests higher core count support

@[email protected] · edit-2 1 year ago

Since you seem to know a lot about this: I would think that at some point the purely physical size of a device is prohibitive of using shared cache, just because the distance from a cpu to the cache can’t be too big. Do you know when this comes into play, if it does? Also, having written some multithreaded computational software, I’ve found that there’s typically (for the stuff I do) a limit to how many cores I can efficiently make use of, before the overhead of opening and closing threads eats the advantage of sharing the work between cores. What kind of “everyday” server stuff is efficiently making use of ≈300 cores? It’s clearly some set of tasks that can be done independently of one another, but do you know more specifically what kind of things people need this many cores on a server for?

@[email protected] · edit-2 1 year ago

What kind of “everyday” server stuff is efficiently making use of ≈300 cores? It’s clearly some set of tasks that can be done independently of one another, but do you know more specifically what kind of things people need this many cores on a server for?

Traditionally VMs would be the use case, but these days, at least in the Linux/cloud world, it’s mainly containers. Containers, and the whole ecosystem that is built around them (such as Kubernetes/OpenShift etc) simply eat up those cores, as they’re designed to scale horizontally and dynamically. See: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale

Normally, you’d run a cluster of multiple servers to host such workloads, but imagine if all those resources were available on one physical hosts - it’d be a lot more effecient, since at the very least, you’d be avoiding all that network overhead and delays. Of course, you’d still have at least a two node cluster for HA, but the efficiency of a high-end node still rules.

@[email protected] · 1 year ago

Normally, you’d run a cluster of multiple servers to host such workloads, but imagine if all those resources were available on one physical hosts - it’d be a lot more effecient, since at the very least, you’d be avoiding all that network overhead and delays.

Exactly! Imagine you have two services in a data center. If they have to communicate a lot with each other, then you would prefer them as close to each other as possible. Why? Well it’s because of the difference between sending a request over a network vs. just sending it to another process on the same host. It’s much more efficient in terms of latency and bandwidth. There are, of course, downsides and other other costs (like the fact that the cores that are handling the requests themselves are much less powerful), so you have to tailor your hardware allocation to your workloads. In general, if you’re CPU-bound, you would want more powerful CPUs (necessitating fewer cores per host for power reasons), and if you’re I/O bound, you want to reduce network latency as much as possible.

Now imagine you have thousands of services. The network I/O can get pretty extreme. Plus, occasionally, you have requirements like the fact that any data traveling from one host to another must be encrypted. So if you can keep as many services as possible on a single host, you reduce a lot of that overhead as well.

tl;dr: everything comes down to trade-offs and understanding the needs of your workloads, but in general, running 300 low power cores is probably indicative of an I/O-bound application and could hypothetically be much more efficient and cost-effective.

@[email protected] · edit-2 1 year ago

Hi Not the guy of the above comment but I’d like to chip in :)

I don’t know about the cache, I think I heard something about this and the answer being basically that yes more distance just makes it slower.

About the multithreading:

If the cost of creating Threads is becoming an issue look into the concept of threadpools. They are a neat way of reusing ressources and ensuring you don’t try to have more parallelism than is actually possible.

Edit: if your work is CPU bound, so the cores are actually computing all the time and not waiting on IO or networking, the rule of thumb is to not let the number of threads exceed the number of cores.

As for usecases for servers with these many cores: shared computing for example VM hosts. The amount of VMs you can sensibly host on a server is limited by the amount of cores you have. Depending on the kind of hypervisor you are using you can share cores between VMs but that’s going to make the VMs slower.

Another example of shared computing are HPC clusters where many people schedule some kind of work, the cluster allocates the ressources executes the task and returns the results to you. Having more cores allows more of these tasks to run in parallel effectively increasing the throughput of the cluster.