How Gradient created an open LLM with a million-token context window

@[email protected] · 9 months ago

@TechNerdWizard42 · 9 months ago

I believe you’d need roughly 500GB of RAM to run it minimum at full context length. There is chatter that 125k context took and used 40GB

I know I can load the 70B models into my laptop at lower bits but it consumes about 140GB of RAM.