I’m currently shopping around for something a bit faster than ollama and because I could not get it to use a different context and output length, which seems to be a known and long ignored issue. Somehow everything I’ve tried so far did miss one or more critical features, like:

  • “Hot” model replacement, so loading and unloading models on demand
  • Function calling
  • Support of most models
  • OpenAI API compatibility (to work well with Open WebUI)

I’d be happy about any recommendations!

  • Possibly linux
    link
    fedilink
    English
    27 hours ago

    I don’t think you are going to find anything faster. Ollama is pretty much as fast as it gets

    • @[email protected]OP
      link
      fedilink
      English
      24 hours ago

      There are many projects out there optimizing the speed significantly. Ollama is unbeaten in the convenience though

    • @[email protected]
      link
      fedilink
      English
      1
      edit-2
      4 hours ago

      It’s not, by far. But vllm or SGLang don’t support switching the model… such a shame.