• @brucethemoose
    link
    English
    25 months ago

    Yeah, and it’s just fp8 truncation right? Not actual “smart” quantization? That’s even a big hit for huge decoder-only llms.