Making a LoRA is fast and easy

@j4k3 · edit-2 1 year ago

Making a LoRA is fast and easy

@[email protected] · edit-2 1 year ago

FYI, Quantization is scaling those models down to a much lower resolution. Usually from long floating point numbers to 8bit (that is integer numbers from -128 to 127 (??)) or with ggml even lower. 4bit is only 16 different numbers. They’re doing a bit of trickery there to get the max out of it.

You get lots of speedup and can fit many more of those more simple numbers into memory this way.

Ticking the box ‘Load in 8bit’ does that conversion on the fly. Downloading an already pre-quantized gptq also ends you up at that point, but you don’t need to handle that large original file in the first place. and the conversion process is a bit more sophisticated.

If a quantization is not mentioned, it’s probably the original, not shrunken-down version.

But mind this is a lossy process. Typically, people who want to continue processing the model itself, (i.e. fine-tune, build loras etc) take the full-resolution ‘original’ version to do so. and not the low-resolution quantized one. But then you’re back at twice or four times more data to fit into your graphics card.

I’m not sure what Oobaboogas does in the background. If it takes the quantized version for the Lora or the original if you used ‘load in 8bits’

I think I read some paper about Lora on quantized models. I don’t know how it works but seems to be possible nowadays. I’m not sure about the quality implications. Usually if you start with something low resolution and use that ‘degraded’ data to modify things… The result won’t be perfect. But that might not be a concern of yours if the result is good enough and you don’t need 5-figures hardware to pull it off.

Idk. If you’re trying to go anywhere with it, maybe read up on it. There is a free ML course on Huggingface and there are lots of guides and other info scattered around in places like this. I’m very aware this is a steep learning curve and I also have started without any knowledge about LLMs in June when Llama took off. And i’m not a pro.

If I might ask: What is your motivation behind training a Lora? I mean except for doing it for the sake of it. Do you want to generate some literature and get somewhere? Have you tried asking it politely to generate a continuation of the XY saga? Maybe it knows the fandom well enough. You could even provide it with one or two paragraphs and see if it picks up on the writing style.

Have fun and keep posting about your adventures…

@j4k3 · edit-2 1 year ago

Hey, thanks for the info. Probably the biggest mystery for me still is what “loading shards” means in the terminal output when a full model with transformers is loaded.

My goal right now is to learn the differences between LoRAs and embeddings in practice. I want to get further into the computer science curriculum on my own. I tend to get hung up on some point, and get lost in the weeds trying to find answers to my questions.

I want to explore the potential to create both a professor type of model loaded with a few books, courseware, and transcribed lectures, and a student type of model that only has access to information as I encounter it.

I probably won’t achieve much with these naive objectives, but I will probably learn a thing or two along the way. I get the impression that this method of creating a model could be quite powerful. It seems like highly curated and tested tuning for purposed built models is the future. I think individualized education is probably the most powerful potential of LLMs.

As a peripheral curiosity, I want to know how a LLM can maybe interact with the Forth programming language. Everything in Forth can be made into a single word or token. The language is threaded with an interpreter. I want to know if combining these two makes something completely new. Like, can Forth give a LLM persistent memory or more.

Messing with Dune stuff is just a way to explore the basics of how modifying a model works.

@[email protected] · 1 year ago

what “loading shards” means

A model is usually something like tens of gigabytes in size. To make it easier to store it on older file systems and to distribute the amout of data, the one giant file gets split up into several chunks / several smaller files. Those fragments are called ‘shards’. If your frontend says “loading shards”, it is reading all those …part1 …part2 …part3 files into memory, and re-combining them. I think they kinda hijacked the term ‘shard’ from the database people, idk.

the differences between LoRAs and embeddings

those things are two entirely different things

I want to get further into the computer science curriculum on my own. I tend to get hung up on some point.

I like computer science myself. I think it’s a fascinating subject. Be aware it sometimes has a steep learning curve. You’ll experience disappointment. And sometimes it’s hard work (and determination) to learn the basics first to be able to do things properly. And you unfortunately(?) also picked one of the more complicated topics. Don’t get discouraged. You’ll definitely learn things along the way! 🙃 In case you get into problems: Try and learn it in a structured way. Get a good book or one of the good(!) free online courses. There are many people who try to do it themselves and end up getting stuck. Instead, you’ll want someone who has a good understanding of computer science (and knows how to teach) to tell you in which order to learn things. If you do it randomly, you might be setting yourself up for failure. It’s hard work… However: you’re allowed to have fun and play around. Just be aware of that fact.

explore the potential to create both a professor type of model […] and a student

Yeah. I’ve heard about that knowledge distillation and models learning from bigger models. I’m not really an expert, though. But there are a few scientific papers about that idea, out there.

I think individualized education is probably the most powerful potential of LLMs

Yeah. I think they can become a powerful tool to assist in teaching. And education is super important. But it’s definitely still a long way to go before they can do more than grade your assignments and help you find your mistakes without your teacher. I wouldn’t want a current AI teach me facts (they come up with fake facts all the time) or trust in their ability to teach things in a reasonable manner.

how a LLM can maybe interact with the Forth programming language

I don’t know much about Forth. I know how stack-machines work. I’m not sure if that aligns in any way with how transformer language models work so the two of them would develop something like ‘symbiosis’. But maybe I didn’t think about it enough.

@j4k3 · edit-2 1 year ago

Thanks for the reply. That makes sense about the shards.

As far as tuning with LoRAs I have very limited expectations. I plan on trying a langchain database soon and I have higher expectations for that experiment.

Forth is interesting because you can basically make anything a word. Like I can make a word that is a pointer to the flags in a register, or a word that is a register. I can make a word that can be called and consists of two previously assigned words, and takes the word for the flag state and copies it to the word for the register. In Forth, everything can be a single word, and every word can be combined all the way to a complete operating system. Overall it is very linear and there is very little syntax. It is very much a language all about what word comes next. My curiosity is what happens if a LLM is given an objective in the Forth interpreter where Forth can influence the context tokens, and a simple conditional branching program can prompt the LLM to iterate solutions.

Like, let’s say I want a bash find command to do something very specific, there is a sandbox terminal to test with, and the LLM has an accessible database of the manpage, help message, and is trained on stackoverflow data. I can already try a model like this and it will give me a command, but it won’t work most of the time. The command will be ~80% correct. If I alter the prompt I can get a different 80% but the error is in a different place. So the correct info is present but I can’t access it. So what happens if Forth could prompt the first question, then test the results and conditionally branch. Maybe it reframes the previous command output as a prompt to correct the bad command. Maybe it stores part of the command that works and prompts further. Maybe it goes meta and prompts the LLM to make a new Forth word to test and execute. Once the objective is reached, the Forth interpreter embeds the working word into the LLM with a strong weight that denotes its programming power and tested effectiveness. Now the LLM has a way to call a Forth word that does something effective. It could be like adversarial machine learning, but harnessing a LLM and the hardware in a way that it can make progress, self correct, and store the results. Forth takes away most of the issues of programming syntax and complexity associated with generating code. The required syntax for Forth can be self generated with a single word used to create it. The power of Forth is that EVERYTHING can be made into a single word.