A new paper by Meta shares how to extend context windows via positional interpolation.
I am still reading to confirm this, but as far as I understand this topic - this is the same (or similar) method based on kaiokendev’s approach with SuperHOT context lengths.
If you want to learn more about how 8k Context w/ SuperHOT was recently achieved (beyond the paper Meta shared), I highly recommend visiting kaiokendev’s pages and posts below.
I was curious to hear more about SuperHOT myself, so I emailed kaiokendev and asked for learning material suggestions.
Here is what they shared with me. Thank you for this list, kaiokendev!
Recommendations from the Developer of SuperHOT (kaiokendev):
Here are some resources to help with learning LLMs:
Andrej Karpathy’s GPT from scratch:
Huggingface’s NLP Course:
And for training specifically:
Alpaca LoRA:
Vicuna:
Community training guide:
Of course for papers, I recommend reading anything on arXiv’s CS - Computation & Language that looks interesting to you:https://arxiv.org/list/cs.CL/recent.
If you found this post interesting, please consider subscribing to the /c/FOSAI community at [email protected] where I do my best to keep you in the know with the most important updates in free open-source artificial intelligence.