Apologies for the basic question, but what’s the difference between GGML and GPTQ? Do these just refer to different compression methods? Which would you choose if you’re using a 3090ti GPU?
Apologies for the basic question, but what’s the difference between GGML and GPTQ? Do these just refer to different compression methods? Which would you choose if you’re using a 3090ti GPU?
Also llama.cpp offers very fast performance with the ggmls compared to using transformers, and sometimes faster than ExLlama.