• Pennomi
    link
    English
    311 days ago

    It depends. A lot of LLMs are memory-constrained. If you’re constantly thrashing the GPU memory it can be both slower and less efficient.