cross-posted from: https://lemmy.dbzer0.com/post/36841328

Hello, everyone! I wanted to share my experience of successfully running LLaMA on an Android device. The model that performed the best for me was llama3.2:1b on a mid-range phone with around 8 GB of RAM. I was also able to get it up and running on a lower-end phone with 4 GB RAM. However, I also tested several other models that worked quite well, including qwen2.5:0.5b , qwen2.5:1.5b , qwen2.5:3b , smallthinker , tinyllama , deepseek-r1:1.5b , and gemma2:2b. I hope this helps anyone looking to experiment with these models on mobile devices!


Step 1: Install Termux

  1. Download and install Termux from the Google Play Store or F-Droid

Step 2: Set Up proot-distro and Install Debian

  1. Open Termux and update the package list:

    pkg update && pkg upgrade
    
  2. Install proot-distro

    pkg install proot-distro
    
  3. Install Debian using proot-distro:

    proot-distro install debian
    
  4. Log in to the Debian environment:

    proot-distro login debian
    

    You will need to log-in every time you want to run Ollama. You will need to repeat this step and all the steps below every time you want to run a model (excluding step 3 and the first half of step 4).


Step 3: Install Dependencies

  1. Update the package list in Debian:

    apt update && apt upgrade
    
  2. Install curl:

    apt install curl
    

Step 4: Install Ollama

  1. Run the following command to download and install Ollama:

    curl -fsSL https://ollama.com/install.sh | sh
    
  2. Start the Ollama server:

    ollama serve &
    

    After you run this command, do ctrl + c and the server will continue to run in the background.


Step 5: Download and run the Llama3.2:1B Model

  1. Use the following command to download the Llama3.2:1B model:
    ollama run llama3.2:1b
    
    This step fetches and runs the lightweight 1-billion-parameter version of the Llama 3.2 model .

Running LLaMA and other similar models on Android devices is definitely achievable, even with mid-range hardware. The performance varies depending on the model size and your device’s specifications, but with some experimentation, you can find a setup that works well for your needs. I’ll make sure to keep this post updated if there are any new developments or additional tips that could help improve the experience. If you have any questions or suggestions, feel free to share them below!

– llama

  • @[email protected]
    link
    fedilink
    English
    215 hours ago

    Thank you!

    Small tips:

    • for the uninitiated: up and does arrows navigate the bash (session) history
    • I did not like the verbosity of ollama server so instead of the aforementioned command, I did chmod 777 /dev/ (not safe) because for some reason I was access denied and ollama serve > /dev/null 2>&1 &. Which basically means that all info, warning and errors will not be displayed. Not recommended in case you want to see what’s going on and in case of errors.
  • @hoshikarakitaridia
    link
    421 hours ago

    This should work with deepseeker-r1 as well right? I assume that’s gotta be a bit better even.

    • @[email protected]
      link
      fedilink
      English
      215 hours ago

      It works but as others have said, you need a tiny version of it so accuracy takes a large hit. I’m sure it still has it’s uses but keep your expectations low.

    • Dran
      link
      419 hours ago

      The proper deepseek r1 requires about 500gb of ram/vram to run, which is orders of magnitude more ram than modern phones have. The smaller models called “deepseek r1” are not the real deepseek model that everyone is talking about.

    • projectmoon
      link
      fedilink
      116 hours ago

      It’s enough to run quantized versions of the distilled r1 model based on Qwen and Llama 3. Don’t know how fast it’ll run though.