@ooli to TechnologyEnglish • 9 months agoGPU's rival? What is Language Processing Unit (LPU)www.turingpost.comexternal-linkmessage-square15arrow-up1100arrow-down111
arrow-up189arrow-down1external-linkGPU's rival? What is Language Processing Unit (LPU)www.turingpost.com@ooli to TechnologyEnglish • 9 months agomessage-square15
minus-squareScottlinkfedilinkEnglish2•9 months agoIt’s not about their frontend, they are running custom LPUs which can process LLM tokens at 500/sec which is insanely impressive. For reference with a max size of 2k tokens, my dual xeon silver 4114 procs take 2-3 minutes.
minus-square@[email protected]linkfedilinkEnglish1•9 months agoNo I got what you meant, but that site is weird if it’s not doing anything on its own
minus-square@FinadillinkEnglish1•9 months agoThat with a fp16 model? Don’t be scared to try even a 4 bit quantization, you’d be surprised at how little is lost and how much quicker it is.
minus-square@[email protected]linkfedilinkEnglish1•9 months agoIsn’t it those that cost $2000 per 250mb of memory?? Meaning you’d about 350 to load any half decent model.
minus-squareScottlinkfedilinkEnglish2•9 months agoNot sure how they are doing it, but it was actually $20k not $2k for 250mb of memory on the card. I suspect the models are probably cached in system memory.
It’s not about their frontend, they are running custom LPUs which can process LLM tokens at 500/sec which is insanely impressive.
For reference with a max size of 2k tokens, my dual xeon silver 4114 procs take 2-3 minutes.
No I got what you meant, but that site is weird if it’s not doing anything on its own
That with a fp16 model? Don’t be scared to try even a 4 bit quantization, you’d be surprised at how little is lost and how much quicker it is.
Isn’t it those that cost $2000 per 250mb of memory?? Meaning you’d about 350 to load any half decent model.
Not sure how they are doing it, but it was actually $20k not $2k for 250mb of memory on the card. I suspect the models are probably cached in system memory.