Answer.AI has released two new scalable training methods, QDoRA and Llama-Pro, which enable efficient finetuning of large language models like Llama 3 with reduced memory requirements. QDoRA is a quantized version of the DoRA method, which combines the parameter efficiency of LoRA with the more granular optimization of full finetuning. Llama-Pro is a method that adds new transformer blocks to improve model specialization without sacrificing existing capabilities. The article presents experimental results showing that QDoRA outperforms other methods in terms of accuracy and memory efficiency. The authors also discuss the potential of these methods to enable open-source developers to create better models for specific tasks and highlight the importance of optimizing inference performance for these models.

Summarized by Llama 3 70B Instruct