@[email protected] to

[email protected]English • 10 months ago

[Paper] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

39

[Paper] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

@[email protected] to

[email protected]English • 10 months ago

Paper page - The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Join the discussion on this paper page

From the abstract: “Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}.”

Would allow larger models with limited resources. However, this isn’t a quantization method you can convert models to after the fact, Seems models need to be trained from scratch this way, and to this point they only went as far as 3B parameters. The paper isn’t that long and seems they didn’t release the models. It builds on the BitNet paper from October 2023.

“the matrix multiplication of BitNet only involves integer addition, which saves orders of energy cost for LLMs.” (no floating point matrix multiplication necessary)

“1-bit LLMs have a much lower memory footprint from both a capacity and bandwidth standpoint”

Edit: Update: additional FAQ published

Chat

@Throwaway4669332255
link
English
2•11 months ago
Apparently I am an idiot and read the wrong paper. The previous paper mentioned that “comparable with the 8-bit models”

https://huggingface.co/papers/2310.11453