Running Llama 2 70B GGML Instruct V2 Q4_1 with GPU offline on a Laptop using Oobabooga

@j4k3 · 2 years ago

Running Llama 2 70B GGML Instruct V2 Q4_1 with GPU offline on a Laptop using Oobabooga

@Zeth0s · 2 years ago

Do you have a guide you followed?

@j4k3 · 2 years ago

I just followed the README.md. There is also some extra documentation in a doc folder in the git archive for the webui. I’m exploring a lot deeper in the code base than most users ever will. I’m too deep into the rabbit hole to give explicit directions on how I got here. You are not likely to encounter the same issues I have had. I am attempting to use many things with overlapping dependencies in parallel and trying to containerize all of them. It’s the first time I have tried containers at this scale. I’m not naturally talented at anything in life, including this effort. I barely have it working. I failed to get several 70B models to work prior to this one. The main key for the installation is to be sure you are in the correct conda container environment when you run “pip install requirements.txt”. If, for any reason conda fails to jump into textgen and goes to the base system or issues any warnings about ‘reverting to the base system’ you need to sort this out and get the conda textgen container running.

The way the conda/python requirements are setup basically (IMO seems to) assume you have some tools that are present in most Linux distros. In a default distrobox container, the distro image is much smaller. I’ve had to manually add stuff like GCC support for C and C++ to most containers for AI stuff. If you were just running everything in a conda container on the host/base system, you won’t have the same problems I have had to deal with. I am planning ahead for situations where I may want to modify the software and may need additional layers of dependencies outside of both conda and the base/host.

All that said, the notes for each model posted on hugging face combined with the number of downloads, stars, and comments for models indicates what is likely to work in practice.