A while ago, I had requested help with using LLMs to manage all my teaching notes. I have since installed Ollama and been playing with it to get a feel for the setup.

I was also suggested the use of RAG (Retrieval Augmented Generation ) and CA (cognitive architecture). However, I am unclear on good self hosted options for these two tasks. Could you please suggest a few?

For example, I tried ragflow.io and installed it on my system, but it seems I need to setup an account with a username and password to use it. It remains unclear if I can use the system offline like the base ollama model, and that information won’t be sent from my computer system.

    • @brucethemoose
      link
      English
      4
      edit-2
      2 hours ago

      I have an old Lenovo laptop with an NVIDIA graphics card.

      @[email protected] The biggest question I have for you is what graphics card, but generally speaking this is… less than ideal.

      To answer your question, Open Web UI is the new hotness: https://github.com/open-webui/open-webui

      I personally use exui for a lot of my LLM work, but that’s because I’m an uber minimalist.

      And on your setup, I would host the best model you can on kobold.cpp or the built-in llama.cpp server (just not Ollama) and use Open Web UI as your front end. You can also use llama.cpp to host an embeddings model for RAG, if you wish.

      This is a general ranking of the “best” models for document answering and summarization: https://huggingface.co/spaces/vectara/Hallucination-evaluation-leaderboard

      …But generally, I prefer to not mess with RAG retrieval and just slap the context I want into the LLM myself, and for this, the performance of your machine is kind of critical (depending on just how much “context” you want it to cover). I know this is !selfhosted, but once you get your setup dialed in, you may consider making calls to an API like Groq, Cerebras or whatever, or even renting a Runpod GPU instance if that’s in your time/money budget.

    • @brucethemoose
      link
      English
      12 hours ago

      Text-generation-webui is cool, but also kinda crufty. Honestly a lot of the stuff is holdovers from what’s now ancient history in LLM land, and it has (for me) major performance issues at longer context.

      • Scrubbles
        link
        fedilink
        English
        11 hour ago

        Anything better you know of? Most of my usage now with it is through its api

        • @brucethemoose
          link
          English
          11 hour ago

          Uh, depends on your hardware and model, but probably TabbyAPI?