Should I struggle through constant crashes to get my 7900gre with 16gb of vram working, possibly through the headache of ONNX? Can anyone report their own success or offer advice? AMD on linux is generally lovely, SD with AMD on linux, not so much. It was much better with my RTX2080 on linux but gaming was horrible with NVIDIA drivers. I feel I could do more with the 16GB AMD card if stability wasn’t so bad. I currently have both cards running to the horror of my PSU. A1111 does NOT want to see the NVIDIA card, only the AMD. Something about the version of pytorch? More work to be done there.

  • Having a much better time back on Cinnamon default instead of Wayland. Oops!

** It heard me. Crashed again on an x/y plot but due to being away from Wayland I was able to see the terminal dump: amdgpu thermal overload! shutdown initiated! That’ll do it! Finally something easy to fix. Wonder why thermal throttling isn’t kicking in to control runaway? Will stress it once more and clock the temps this time.

Temps were exceeding 115C, phew! No idea why the default amdgpu driver has no fan control but they’re ripping like they should now. Monitoring temps has restored system stability. Using multiple amd/nvidia dedicated venv folders and careful driver choice/installation were the keys to multigpu success.

  • @abcdqfrOP
    link
    23 months ago

    How bad are your crashes? Mine will either freeze the system entirely or crash the current lightdm session, sometimes recovering, sometimes freezing anyway. Needs power cycle to rescue. What is the DE you speak of? openbox?

    • @[email protected]
      link
      fedilink
      23 months ago

      yes, mine are similar. I used to run kde plasma while generating but plasma took too much vram, so now im using icewm. I noticed that the crashes happen when something needed vram when its already all used, so thats why icewm reduces crashes, since its very light on resources.

    • @[email protected]
      link
      fedilink
      13 months ago

      That seems strange. Perhaps you should stress-test your GPU/system to see if it’s a hardware problem.

      • @abcdqfrOP
        link
        13 months ago

        I had that concern as well with it being a new card. It performs fine in gaming as well as in every glmark benchmark so far. I have it chalked up to amd support being in experimenntal status on linux/SD. Any other stress tests you recommend while I’m in the return window!? lol