Apologies for the basic question, but what’s the difference between GGML and GPTQ? Do these just refer to different compression methods? Which would you choose if you’re using a 3090ti GPU?

  • @markon
    link
    English
    11 year ago

    Also llama.cpp offers very fast performance with the ggmls compared to using transformers, and sometimes faster than ExLlama.