Where does Microsoft's NPU obsession leave Nvidia?

Alphane Moon · 10 days ago

Where does Microsoft's NPU obsession leave Nvidia?

@[email protected] · 10 days ago

I don’t understand this post. A desktop 4070 has 1:1 FP16 which results in less than 30TOPs. MS requires a minimum of 45TOPs for a device to be co-pilot certified, that’s why they’re not certified. Worse, the limited memory pool make any NV laptop card apart from a 4090 a difficult sell.

Alphane Moon · edit-2 10 days ago

I am referring to practical use cases.

For example how fast would a 45 TOPs NPU ML upscale a 10 min SD video source to HD (takes about 15 min with 3080 + 5800X). What video upscaling frameworks/applications have support for such NPUs?

Another example would be local LLMs. Are there any LLMs comparable to say llama 3.1 1B that can be run be locally via NPU?

To my knowledge there are no video gaming upscaling tech (comparable to DLSS) that can be run off a NPU.

@[email protected] · 10 days ago

Both video and DLSS use(d) diffusion to upscale images (DLSS has allegedly ttransitioned to a transformer model). AFAIK there’s no simple way to run diffusion on an NPU as of today.

Regarding local running LLMs, well, I’ll take an NPU with 32-64 gigs of Ram over an anemic llama 1-3B model run on the GPU. And that’s before considering people using Windows and taking advantage of MS Olive. Llama3.3 70B, which has similar performance to Llama3.2 405B will run on 64GB of ram, ezpz, forget about ever running it on local PC with an NVIDIA card.

My eyes are set on the strix halo 128GB variant, I’m going to put that through its paces.

BTW, most of the interesting models will fail to run locally due to NVIDIA’s shit VRAM allowance, if nvidia were giving people a minimum of 16GB of VRAM I’m sure MS would happily certify it.

Alphane Moon · 10 days ago

That’s fair. But do you see where I am coming from?

Marketing around TOPs isn’t everything.

Interesting is a relative term. I find upscaling older SD content interesting. You can’t just dismiss this use case because it doesn’t fit into your arguement.

Getting a local LLM (Llama 1B is not as good as cloud LLMs of course, but it does have valid use cases) with a Nvidia GPU is extremely simple. Can you provide a 5 bullet point guide for setting up a local LLM with 32 GB RAM (64 GB RAM isn’t that common in laptops).

@[email protected] · edit-2 10 days ago

Install lmstudio

Profit

*If you want to use the NPU

Apply for beta branch (3.6.x) at lmstudio

Install lmstudio beta

Profit

Edit: Almost forgot, the AMD drivers (under review) for the latest NPU containing CPUs (7xxx and upward) should come with the spring kernel update to 6.3, fingers crossed. It’s been two years, they took their sweet time. Windows support was available on release…