Nvidia tries to kill CUDA translation layers | Tom's Hardware

@[email protected] · 10 months ago

Nvidia tries to kill CUDA translation layers | Tom's Hardware

@[email protected] · 10 months ago

As AMD, Intel, Tenstorrent, and other companies develop better hardware, more software developers will be inclined to design for these platforms, and Nvidia’s CUDA dominance could ease over time.

This seems a bit optimistic to me. CUDA is currently the de facto method of utilising a GPU’s power efficiently. This makes them an easy choice for anyone with serious compute power needs. The other manufacturers are fighting an uphill battle trying to create an alternative that won’t be used until it is definitively better.

This just seems like a catch 22 to me

@[email protected] · edit-2 10 months ago

It’s not “optimistic”, it’s actually happening. Don’t forget that GPU compute is a pretty vast field, and not every field/application has a hard-coded dependency on CUDA/nVidia.

For instance, both TensorFlow and PyTorch work fine with ROCm 6.0+ now, and this enables a lot of ML tasks such as running LLMs like Llama2. Stable Diffusion also works fine - I’ve tested 2.1 a while back and performance has been great on my Arch + 7800 XT setup. There’s plenty more such examples where AMD is already a viable option. And don’t forget ZLUDA too, which is being continuing to be improved.

I mean, look at this benchmark from Feb, that’s not bad at all:

And ZLUDA has had many improvements since then, so this will only get better.

Of course, whether all this makes an actual dent in nVidia compute market share is a completely different story (thanks to enterprise $$$ + existing hw that’s already out there), but the point is, at least for many people/projects - ROCm is already a viable alternative to CUDA for many scenarios. And this will only improve with time. Just within the last 6 months for instance there have been VAST improvements in both ROCm (like the 6.0 release) and compatibility with major projects (like PyTorch). 6.1 was released only a few weeks ago with improved SD performance, a new video decode component (rocDecode), much faster matrix calculations with the new EigenSolver etc. It’s a very exiting space to be in to be honest.

So you’d have to be blind to not notice these rapid changes that’s really happening. And yes, right now it’s still very, very early days for AMD and they’ve got a lot of catching up to do, and there’s a lot of scope for improvement too. But it’s happening for sure, AMD + the community isn’t sitting idle.

@filister · 10 months ago

How easy it is to install and configure Rocm and also how limiting it is? I also heard about ZLUDA, etc. and I very much want to pick AMD as my next GPU, especially considering the fact that I am using Wayland, but I think they are still far behind NVIDIA?

@AProfessional · edit-2 10 months ago

On some distros its packaged, trivial. On others its not and annoying. How well it works depends on the exact usage.

@[email protected] · edit-2 10 months ago

Since you’re on Linux, it’s just a matter of installing the right packages from your distros package manager. Lots of articles on the Web, just google your app + “ROCm”. Main thing you gotta keep in mind is the version dependencies, since ROCm 6.0/6.1 was released recently, some programs may not yet have been updated for it. So if your distro packages the most recent version, your app might not yet support it.

This is why many ML apps also come as a Docker image with specific versions of libraries bundled with them - so that could be an easier option for you, instead of manually hunting around for various package dependencies.

Also, chances are that your app may not even know/care about ROCm, if it just uses a library like PyTorch / TensorFlow etc. So just check it’s requirements first.

As for AMD vs nVidia in general, there are a few places mainly where they lagged behind: RTX, compute and super sampling.

For RTX, there has been improvements in performance with the RDNA3 cards, but it does lag behind by a generation. For instance, the latest 7900 XTX’s RTX performance is equivalent to the 3080.
Compute is catching up as I mentioned earlier, and in some cases the performance may even match nVidia. This is very application/library specific though, so you’ll need to look it up.
Super Sampling is a bit of a weird one. AMD has FSR and it does a good job in general. In some cases, it may even perform better since it uses much simpler calculations, as opposed to nVidia’s deep learning technique. And AMD’s FSR method can be used with any card in fact, as long as the game supports it. And therein lies the catch, only something like 1/3rd of the games out there support it, and even fewer games support the latest FSR 3. But there are mods out there which can enable FSR (check Nexus Mods) that you might be able to use. In any case, FSR/DLSS isn’t a critical thing, unless you’re gaming on a 4K+ monitor.

You can check out Tom’s Hardware GPU Hierarchy for the exact numbers - scroll down halfway to read about the RTX and FSR situation.

So yes, AMD does lag behind in nVidia but whether this impacts you really depends on your needs and use cases. If you’re a Linux user though, getting an AMD is a no-brainer - it just works so much better, as in, no need to deal with proprietary driver headaches, no update woes, excellent Wayland support etc.

@filister · 10 months ago

Yes, I am running NixOS with Hyprland at the moment as a trial and most things were pretty well. I know that open source NVIDIA drivers are crap especially if you want to run Wayland, but I am more interested into the AI/ML side as I want to play a bit with open weight LLMs, and Pytorch. I used to do some AI with Tensorflow, but I would like to learn more about Pytorch.

I used to have an older AMD card in the past that I borrowed from a friend and tried to install Rocm and it was an absolute disaster. That was around COVID and even though I consider myself fairly familiar with Linux and very comfortable around the command line, I didn’t make it work back then.

The majority of the opinions I have also read were just pointing out that CUDA is just plug and play and Rocm is a lot of tinkering. And I think I am simply too old and tired of this constant tinkering and I would prefer something that will simply just work out of the box.

I really hate NVIDIA and don’t like the company but still consider them with something like i3, just to have some peace of mind and know that everything works out of the box with their proprietary drivers.

Andromxda 🇺🇦🇵🇸🇹🇼 · 10 months ago

Since you run NixOS, these things might be helpful for you:

https://nixos.wiki/wiki/AMD_GPU#HIP

https://github.com/nixos-rocm/nixos-rocm

@[email protected] · edit-2 10 months ago

Unfortunately the article of the post directly contradicts your point about ZLUDA improving:

ZLUDA appears to be floundering now, with both AMD and Intel having passed on the opportunity to develop it further

Following the links and searching around, I found this: Andrzej “vosen” Janik, the lead dev, says in his FAQ:

What’s the future of the project?
With neither Intel nor AMD interested, we’ve run out of GPU companies. I’m open though to any offers of that could move the project forward. Realistically, it’s now abandoned and will only possibly receive updates to run workloads I am personally interested in (DLSS).

@[email protected] · edit-2 10 months ago

I based my statements on the actual commits being made to the repo, from what I can see it’s certainly not “floundering”:

In any case, ZLUDA is really just a stop-gap arrangement so I don’t see it being an issue either way - with more and more projects supporting AMD cards, it won’t be needed at all in the near future.

@[email protected] · 10 months ago

Following the links and searching around, I found this: Andrzej “vosen” Janik, the lead dev, says in his FAQ:

There is a fork which seems more active (see 1 and 2)

It should probably at least be mentioned on the read me of the original project.

Nvidia tries to kill CUDA translation layers | Tom's Hardware

Nvidia tries to kill CUDA translation layers | Tom's Hardware

Nvidia bans using translation layers for CUDA software — previously the prohibition was only listed in the online EULA, now included in installed files [Updated]