• @__dev
    link
    English
    28 months ago

    “unified memory” is an Apple marketing term for what everyone’s been doing for well over a decade. Every single integrated GPU in existence shares memory between the CPU and GPU; that’s how they work. It has nothing to do with soldering the RAM.

    You’re right about the bandwidth though, current socketed RAM standards have severe bandwidth limitations which directly limit the performance of integrated GPUs. This again has little to do with being socketed though: LPCAMM supports up to 9.6GT/s, considerably faster than what ships with the latest macs.

    This is why user-replaceable RAM and discrete GPUs are going to die out. The overhead and latency of copying all that data back and forth over the relatively slow PCIe bus is just not worth it.

    The only way discrete GPUs can possibly be outcompeted is if DDR starts competing with GDDR and/or HBM in terms of bandwidth, and there’s zero indication of that ever happening. Apple needs to puts a whole 128GB of LPDDR in their system to be comparable (in bandwidth) to literally 10 year old dedicated GPUs - the 780ti had over 300GB/s of memory bandwidth with a measly 3GB of capacity. DDR is simply not a good choice GPUs.

    • @[email protected]
      link
      fedilink
      English
      28 months ago

      “unified memory” is an Apple marketing term for what everyone’s been doing for well over a decade.

      Wrong. Unified memory (UMA) is not an Apple marketing term, it’s a description of a computer architecture that has been in use since at least the 1970’s. For example, game consoles have always used UMA.

      Every single integrated GPU in existence shares memory between the CPU and GPU; that’s how they work.

      Again, wrong.

      While iGPUs have existed for PCs for a long time, they did not use a unified memory architecture. What they did was reserve a portion of the system RAM for the GPU. For example on a PC with 512MB RAM and an iGPU, 64MB may have been reserved for the GPU. The CPU then had access to 512-64 = 448MB. While they shared the same physical memory chips, they both had a separate address space. If you wanted to make a texture available to the GPU, it still had to be copied to the special reserved RAM space for the GPU and the CPU could not access that directly.

      With unified memory, both CPU and GPU share the same address space. Both can access the entire memory. No RAM is reserved purely for the GPU. If you want to make something available to the GPU, nothing needs to be copied, you just need to point to where it is in RAM. Likewise, anything done by the GPU is immediately accessible by the CPU.

      Since there is one memory pool for both, you can use RAM more efficiently. If you have a discrete GPU with 16GB VRAM, and your app only needs 8GB VRAM, that other memory just sits there being useless. Alternatively, if your app needs 24GB VRAM, you can’t run it because your GPU only has 16B, even if you have lots of system RAM available.

      With UMA you can use all the RAM you have for whatever you need it for. On an M2 Ultra with 192GB RAM you can use almost all of that for the GPU (minus a little bit that’s used for the OS and any running apps). Even on a tricked out PC with a 4090 you can’t run anything that needs more than 24GB VRAM. Want to run something where the GPU needs 180MB of memory? No problem on an M1 Ultra.

      It has nothing to do with soldering the RAM.

      It has everything to do with soldering the RAM. One of the reason iGPUs sucked, other than not using UMA, is that GPUs performance is almost limited by memory bandwidth. Compared to VRAM, standard system RAM has much, much less bandwidth causing iGPUs to be slow.

      A high-bandwidth memory bus, like a GPU needs, has a lot of connections and runs at high speeds. The only way to do this reliably is to physically place the RAM very close to the actual GPU. Why do you think GPUs do not have user-upgradable RAM?

      Soldering the RAM makes it possible to integrate a CPU and an non-sucking GPU. Go look at the inside of a PS5 or XSX and you’ll see the same thing: an APU with the RAM chips soldered to the board very close to it.

      This again has little to do with being socketed though: LPCAMM supports up to 9.6GT/s, considerably faster than what ships with the latest macs.

      LPCAMM is a very recent innovation. Engineering samples weren’t available until late last year and the first products will only hit the market later this year. Maybe this will allow for Macs with user-upgradable RAM in the future.

      The only way discrete GPUs can possibly be outcompeted is if DDR starts competing with GDDR and/or HBM in terms of bandwidth

      What use is high bandwidth memory if it’s a discrete memory pool with only a super slow PCIe bus to access it?

      Discrete VRAM is only really useful for gaming, where you can upload all the assets to VRAM in advance and data practically only flows from CPU to GPU and very little in the opposite direction. Games don’t matter to the majority of users. GPGPU is much more interesting to the general public.

      • @__dev
        link
        English
        18 months ago

        Wrong. Unified memory (UMA) is not an Apple marketing term, it’s a description of a computer architecture that has been in use since at least the 1970’s. For example, game consoles have always used UMA.

        Apologies, my google-fu seems to have failed me. Search results are filled with only apple-related results, but I was now able to find stuff from well before. Though nothing older than the 1990s.

        While iGPUs have existed for PCs for a long time, they did not use a unified memory architecture.

        Do you have an example, because every single one I look up has at least optional UMA support. The reserved RAM was a thing but it wasn’t the entire memory of the GPU instead being reserved for the framebuffer. AFAIK iGPUs have always shared memory like they do today.

        It has everything to do with soldering the RAM. One of the reason iGPUs sucked, other than not using UMA, is that GPUs performance is almost limited by memory bandwidth. Compared to VRAM, standard system RAM has much, much less bandwidth causing iGPUs to be slow.

        I don’t disagree, I think we were talking past each other here.

        LPCAMM is a very recent innovation. Engineering samples weren’t available until late last year and the first products will only hit the market later this year. Maybe this will allow for Macs with user-upgradable RAM in the future.

        Here’s a link to buy some from Dell: https://www.dell.com/en-us/shop/dell-camm-memory-upgrade-128-gb-ddr5-3600-mt-s-not-interchangeable-with-sodimm/apd/370-ahfr/memory. Here’s the laptop it ships in: https://www.dell.com/en-au/shop/workstations/precision-7670-workstation/spd/precision-16-7670-laptop. Available since late 2022.

        What use is high bandwidth memory if it’s a discrete memory pool with only a super slow PCIe bus to access it?

        Discrete VRAM is only really useful for gaming, where you can upload all the assets to VRAM in advance and data practically only flows from CPU to GPU and very little in the opposite direction. Games don’t matter to the majority of users. GPGPU is much more interesting to the general public.

        gestures broadly at every current use of dedicated GPUs. Most of the newfangled AI stuff runs on Nvidia DGX servers, which use dedicated GPUs. Games are a big enough industry for dGPUs to exist in the first place.