Double the training data, double the trained context (4096 now), a chat tuned varient, the omission of the 35b model for now (it apparently isn’t “safe” enough), and commercial use is allowed (not that most of the people using llama cares about licensing).
Double the training data, double the trained context (4096 now), a chat tuned varient, the omission of the 35b model for now (it apparently isn’t “safe” enough), and commercial use is allowed (not that most of the people using llama cares about licensing).
Thanks, that sounds more exciting than just “safety” haha.