I am just learning in this space and I could be wrong about this one, but… The GGML and GPTQ models are nice for getting started with AI in Oobabooga. The range of models available is kinda odd to navigate and understand in context as far as how they compare and all the different quantization types, settings, and features. I still don’t understand a lot of it. One of the main aspects I didn’t (still don’t fully) understand are how some models do not have a quantization stated like GGML/GPTQ, but still work using Transformers. I tried some of these by chance at first, and avoided them because they take longer to initially load.

Yesterday I created my first LoRAs and learned through trial and error, the only models I can use to train a LoRA on are the ones that use Transformers, and can be set to 8bit mode. Even using GGML/GPTQ models with 8 bit quantization, I could not use them to make a LoRA. It could be my software setup, but I think there is either a fundamental aspect of these models I haven’t learned yet, or it is a limitation in Oobabooga’s implementation. Either way, the key takeaway is to try making a LoRA with a Transformers based model loaded in Oobabooga, and be sure the “load in 8 bit” box is checked.

I didn’t know what to expect with this, and haven’t come across many examples, so I put off trying this until now. I have an 12th gen i7 with 20 logical cores and a 16GBV 3080Ti in a laptop. I can convert an entire novel into a text file and load this as raw text (tab) for training in Oobabooga using the default settings. If my machine has some assistance with cooling, I can create the LoRA in 40 minutes using the default settings and a 7B model. This has a mild effect. IIRC the default weight of the LoRA network is 32. If this is turned up to 96-128, it will have a more noticeable effect on personality. It still won’t substantially improve the Q&A accuracy, but it may improve the quality to some extent.

I first tested with a relatively small Wikipedia article on Leto II (Dune character) formatted for this purpose manually. This didn’t change anything substantially. Then I tried with the entire God Emperor of Dune e-book as raw text. This had garbage results, probably due to all the nonsense before the book even starts, and the terrible text formatting extracted from an eBook. The last dataset I tried was the book text only, with everything reflowed using a Linux bash script I wrote to alter newline characters, spacing, and remove page gaps. Then I manually edited with find and replace to remove special characters and any formatting oddballs I could find. This was the first LoRA I made where the 7B model’s tendency to hallucinate seemed more evident than issues with my LoRA. For instance, picking a random name of an irrelevant character that occurs 3 times in 2 sentences of the LoRA text and prompting about it results in random unrelated output. The overall character identity is also weak despite a strong character profile and a 1.8MB text file for the LoRA.

This is just the perspective from a beginner’s first attempt. Actually tuning this with a bit of experience will produce far better results. I’m just trying to say, if you’re new to this and just poking around, try making a LoRA. It is quite easy to do.

  • @j4k3OP
    link
    English
    2
    edit-2
    1 year ago

    Thanks for the reply. That makes sense about the shards.

    As far as tuning with LoRAs I have very limited expectations. I plan on trying a langchain database soon and I have higher expectations for that experiment.

    Forth is interesting because you can basically make anything a word. Like I can make a word that is a pointer to the flags in a register, or a word that is a register. I can make a word that can be called and consists of two previously assigned words, and takes the word for the flag state and copies it to the word for the register. In Forth, everything can be a single word, and every word can be combined all the way to a complete operating system. Overall it is very linear and there is very little syntax. It is very much a language all about what word comes next. My curiosity is what happens if a LLM is given an objective in the Forth interpreter where Forth can influence the context tokens, and a simple conditional branching program can prompt the LLM to iterate solutions.

    Like, let’s say I want a bash find command to do something very specific, there is a sandbox terminal to test with, and the LLM has an accessible database of the manpage, help message, and is trained on stackoverflow data. I can already try a model like this and it will give me a command, but it won’t work most of the time. The command will be ~80% correct. If I alter the prompt I can get a different 80% but the error is in a different place. So the correct info is present but I can’t access it. So what happens if Forth could prompt the first question, then test the results and conditionally branch. Maybe it reframes the previous command output as a prompt to correct the bad command. Maybe it stores part of the command that works and prompts further. Maybe it goes meta and prompts the LLM to make a new Forth word to test and execute. Once the objective is reached, the Forth interpreter embeds the working word into the LLM with a strong weight that denotes its programming power and tested effectiveness. Now the LLM has a way to call a Forth word that does something effective. It could be like adversarial machine learning, but harnessing a LLM and the hardware in a way that it can make progress, self correct, and store the results. Forth takes away most of the issues of programming syntax and complexity associated with generating code. The required syntax for Forth can be self generated with a single word used to create it. The power of Forth is that EVERYTHING can be made into a single word.