Confusing Vicky (Vicuna-13B)

@Technotica · edit-2 2 years ago

Confusing Vicky (Vicuna-13B)

@Technotica · edit-2 2 years ago

It`s super easy, I followed this site: https://blog.gopenai.com/the-total-noobs-guide-to-harnessing-the-gpu-for-llama-inference-2e3c4fdfbc84 (You can view it without login with archive.ph) then just downloaded a quantized Vicuna model from huggingface (it works with llama and alpaca too).

Then start the dialogue with, as an example: ./main -m /path/to/model/file --repeat_penalty 1.0 --color -i -r “User:” -f /path/to prompt/file

You can see example prompt files in the llama.cpp folder and write your own short one.

I did it once on Windows but then decided to use my debian system for it, I found the process easier on Linux as you can get the required software (python etc) easier. If you want a more complete and even easier setup try this:

https://cocktailpeanut.github.io/dalai/#/

Dalai installs everything you need for you, I don’t think it supports Vicuna but you can use Llama and Alpaca with it. (as in directly installing Vicuna, you can download the Vicuna model files manually and put them into the appropriate place for Dalai I am sure)

Some more guides:

https://hub.tcno.co/ai/text-ai/vicuna/

https://www.linkedin.com/pulse/step-by-step-guide-running-vicuna-13b-large-language-nischal

oh, p.s. I just learned about the --n-gpu-layers XX option, this makes the model run so much faster I can run Vicuna-33B Q5 now adequately fast with 31 of 63 layers offloaded.

Scew · 2 years ago

Thank you very much! :D