What are your thoughts on #privacy and #itsecurity regarding the #LocalLLMs you use? They seem to be an alternative to ChatGPT, MS Copilot etc. which basically are creepy privacy black boxes. How can you be sure that local LLMs do not A) “phone home” or B) create a profile on you, C) that their analysis is restricted to the scope of your terminal? As far as I can see #ollama and #lmstudio do not provide privacy statements.

  • @[email protected]
    link
    fedilink
    English
    12
    edit-2
    15 hours ago

    As far as I can see #ollama and #lmstudio do not provide privacy statements.

    That’s because they are not online services (which is a good thing!). Online services like ChatGPT and desktop applications like LM Studio are not in the same product category.

    LM Studio is more akin to, say, VLC or Notepad++ (which also do not have privacy policies). These are desktop applications that have some limited network functions (like autoupdates).

    LM Studio does offer details of which features require internet access and which are fully offline here: https://lmstudio.ai/docs/offline . In short: everything important is offline. It has built-in search features so you can find and download models from Huggingface, and it also has an autoupdate feature to find and download new versions. You could run it on an airgapped system (or more likely, set it up in a container/VM without network access), and simply load in model files manually if you prefer.

    Personally I recommend LM Studio, because it’s super easy to set up and use but still quite powerful.

  • ddh
    link
    fedilink
    English
    1920 hours ago

    I run Ollama with Open WebUI at home.

    A) the containers they run in by default can’t access the Internet, but they are provided access if we turn on web search or want to download new models. Ollama and Open WebUI are fairly popular products and I haven’t seen any evidence of nefarious activity so far.

    B) they create a profile on me and my family members that use them, by design. We can add sensitive documents that the models can use.

    C) they are restricted by what we type and the documents we provide.

    • fmstrat
      link
      fedilink
      English
      2
      edit-2
      17 hours ago

      To add to this, I run the same setup, but add Continue to VSCode. It makes an interface similar to Cursor that uses the Ollama instance.

      One thing to be careful of, the Ollama port has no authentication (ridiculous, but it is what it is).

      You’ll need either a card with 12-16GB VRAM for the recommended models for code generation and auto complete, or you may he able to get away with an 8GB card if it’s a second card in the system. You can also run on CPU, but it’s very slow that way.

      @[email protected]

    • @ShotDonkeyOP
      link
      020 hours ago

      Thank you. As far as I can see these models are for free. Doing data mining on users would be a tempting thing, right? Ollama does not specify this on their homepage, no payed plans, no ‘free for private use’ etc. How do they pay their staff and electricity and harware bills for model training? Do you know anything on the underlying business models?

      • @[email protected]
        link
        fedilink
        817 hours ago

        The english word “free” actually carries two meanings: “free as in free food” (gratis) and “free as in free speech” (libre).

        Ollama is both gratis and libre.

        And about the money stuff: Ollama used to be Facebook’s proprietary model, an answer to ChatGPT and Bing Chat/Copilot. Facebook lagged behind the other players and they just said “fuck it, we’re going open-source”. That’s how and why it’s free.

        Due to it being open-source, even though models are by design binary blobs, the code that interacts with them and runs them is open-source. If they were connecting to the Internet and phoning home to Facebook, chances are this would’ve been found out by the community due to the open nature of the project.

        Even if it weren’t open-source, since it runs locally you could at least block (or view) Internet access.

        Basically, even though this is from Facebook, one of the big bads of privacy on the Internet, it’s all good in the end.

        • @[email protected]
          link
          fedilink
          5
          edit-2
          16 hours ago

          Ollama used to be Facebook’s proprietary model

          Just to be clear, llama is the facebook model, ollama is the software that lets you run llama, along with many other models.

          Ollama has internet access (otherwise how could it download models?), the only true privacy solution is to run in a container with no internet access after downloading models, or air gap your computer.

          • @JustAnotherKay
            link
            19 minutes ago

            The only true privacy solution…

            Could you not just monitor/block outgoing traffic?

        • @ShotDonkeyOP
          link
          117 hours ago

          Great, thanks for this background!

      • ddh
        link
        fedilink
        English
        1320 hours ago

        Ollama and Open WebUI, as far as I know, are just open source software projects created to run pre-trained models, and have the same business model as many other open source projects on Github.

        The models themselves come from Google, Meta and others. Have a look at all the models available on Hugging Face. The models themselves are just binary files. They’ve been trained and there are no ongoing costs to use them apart from energy your computer uses to run them.

      • @RedditWanderer
        link
        320 hours ago

        Did you do any research at all?

        It’s fbs model. They made it free as a PR move. If youre actually worried about it phoning home, you could easily monitor the traffic leaving your PC and see if it’s collecting data.

        It’s facebook, they pay their staff with the astronomical amount of money they have. This is a simpler model, and their goal is to look like the good guy by making this one free, and selling later ones like all the other AI companies are doing. Except FB has fuck you money.

  • @[email protected]
    link
    fedilink
    819 hours ago

    From my privacy trials on ollama - any model downloaded does not know the date or time and cannot access the internet.

    If you are still sceptical you could download something like alpaca on flathub and once youve acquired a model, remove internet access etc through flatseal.

    • @Deckweiss
      link
      518 hours ago

      I’m running gpt4all on AMD. Had to figure out which packages to install, which took a while, but since then it runs fine just fine

        • @Deckweiss
          link
          3
          edit-2
          13 hours ago

          arch wiki and gpt4all github & issues

      • @[email protected]
        link
        fedilink
        12 hours ago

        It is slow. Syntax & community idioms suck. The package ecosystem is a giant mess—constant dependency breakage, many supply-side attacks, quality is all over the place with many packages with failing tests or build that isn’t reproducible—& can largely be an effect of too many places saying this is the first language you should learn first. When it comes to running Python software on my machine, it always is the buggiest, breaks the most shipping new software, & uses more resources than other things.

        When I used to program in it, I thought Python was so versatile that it was the 2nd best language at everything. I learned more languages & thought it was 3rd best… then 4th… then realized it isn’t good at anything. The only reason it has things going for it is all the effort put into the big C libraries powering the math, AI, etc. libraries.

  • @Sonor
    link
    120 hours ago

    have you looked at backyard ai?

      • @[email protected]
        link
        fedilink
        716 hours ago

        “respect your privacy” is a vague buzzword phrase, and for a post about local LLMs linking a client that calls APIs which log user data is unhelpful

        • @[email protected]
          link
          fedilink
          16 hours ago

          By “respect your privacy” I mean no personal data is collected. So as long as you are not putting personal details about yourself in the queries and use a VPN you can stay pretty anonymous while using the service.

        • @[email protected]
          link
          fedilink
          216 hours ago

          Thanks.

          I feel it would be constructive if people who downvoted the OP (I am not them) told them why. As then the OP can learn what this community expects and people who stumble across comments being downvoted, we can clearly see why and learn more from it.

          • @[email protected]
            link
            fedilink
            215 hours ago

            didn’t even downvote, i suspect taking time to explain something you disagree with in a nuanced matter is more effort than most people would care to do

            • @[email protected]
              link
              fedilink
              2
              edit-2
              15 hours ago

              No I wasn’t accusing you of downvoting. Just speaking generally here.

              I guess you’re correct.