Only quantized versions of the model were leaked. If you see any unquantized version of it then it’s something which was recreated from these, and not the original model. People have also requanted it from GGUF to EXL2 and probably other formats too.
Do you know if there are any plans to quantize it? I’d love to test it, but my 3090 can’t handle 70b models without quantization, unfortunately.
There are quantized versions on hugging face. There’s a q2 version, but idk how well that performs
Only quantized versions of the model were leaked. If you see any unquantized version of it then it’s something which was recreated from these, and not the original model. People have also requanted it from GGUF to EXL2 and probably other formats too.