Greetings Reddit Refugees! (and Everyone Across the Fediverse!)
I hope your migration is going well! If you haven’t been here before, Welcome to FOSAI! Your new Lemmy landing page for all things free open-source artificial intelligence.
This is a follow-up post to my first Welcome Message.
Looking for the FOSAI Nexus? Click here to be taken to (v0.0.1).
Here I will share insights and instructions on how to set up some of the tools and applications in the aforementioned AI Suite.
Please note that I did not develop any of these, but I do have each one of them working on my local PC, which I interface with regularly. I will plan to do posts exploring each software in detail - but first - let’s get a better idea what we’re working with.
As always, please don’t hesitate to comment or share your thoughts if I missed something (or you want to understand or see a concept in more detail).
Getting Started with FOSAI
What is oobabooga?
In short, oobabooga is a free and open source web client someone (oobabooga) made to interface with HuggingFace LLMs (large language models). As far as I understand, this is the current standard for many AI tinkerers and those who wish to run models locally. This client allows you to easily download, chat, and configure with text-based models that behave like Chat-GPT, however, not all models on HuggingFace are at the same level of Chat-GPT out-of-the-box. Many require ‘fine-tuning’ or ‘training’ to produce consistent, coherent results. The benefit using HuggingFace (instead of Chat-GPT) is that you have much more options to choose from regarding your AI model, including the option to choose a censored or uncensored version of a model, untrained or pre-trained, etc. Oobabooga is an interface that let’s you do all this (theoretically), but can have a bit of a learning curve if you don’t know anything about AI/LLMs.
What is gpt4all?
gpt4all is the closest thing you can currently download to have a Chat-GPT style interface that is compatible with some of the latest open-source LLM models available to the community. Some models can be downloaded in quantized formats, unquantized formats, and base formats (which typically run GPU only), but there are new model formats that are emerging (GGML), which enable GPU + CPU compute. This GGML format seems to be the growing standard for consumer-grade hardware. Some prefer the user experience of gpt4all over oobabooga, and some feel the exact opposite. For me - I prefer the options oobabooga provides - so I use that as my ‘daily driver’ while gpt4all is a backup client I run for other tests.
What is Koboldcpp?
Koboldcpp, like oobabooga and gpt4all is another web-based interface you can run to chat with LLMs locally. It enables GGML inference, which can be hard to get running on oobabooga depending on the version of your client and updates from the developer. Koboldcpp, however, is part of a totally different platform and team of developers who typically focus on the roleplaying aspect of generative AI and LLMs. Koboldcpp feels more like NovelAI than anything I’ve ran locally, and has similar functionality and vibes as AI Dungeon. In fact, you can download some of the same models and settings that they use to emulate something very similar (but 100% local, assuming you have capable hardware).
What is TavernAI?
TavernAI is a customized web-client that seems as functional as gpt4all in most regards. You can use TavernAI to connect with Kobold’s API - as well as insert your own Chat-GPT API key to talk with OpenAI’s GPT-3 (and GPT-4 if you have API access).
What is SillyTavern?
LLM Frontend for Power Users. SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs. SillyTavern is developed by Cohee and RossAscends
What is Stable Diffusion?
How-To-Install-StableDiffusion (Automatic1111)
Stable Diffusion is a groundbreaking and popular AI model that enables text to image generation. When someone thinks of “Stable Diffusion” people tend to picture Automatic1111’s UI/UX, which is the same interface oobabooga is inspired by. This UI/UX has become the defacto standard for almost all Stable Diffusion workflows. Fun factoid - it is widely believed MidJourney is a highly tuned version of a Stable Diffusion model, but one who’s weights, LoRAs, and configurations made closed-source after training and alignment.
What is ComfyUI?
Nodes/graph/flowchart interface to experiment and create complex Stable Diffusion workflows without needing to code anything. Fully supports SD1.x, SD2.x and SDXL. This ui will let you design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart based interface. For some workflow examples and see what ComfyUI can do you can check out.
What is ControlNet?
ControlNet is a way you can manually control models of Stable Diffusion, allowing you to have complete freedom over your generative AI workflow. The best example of what this is (and what it can do) can be seen in this video. Notice how it combines an array of tools you can use as pre-processors for your prompts, enhancing the composition of your image by giving you options to bring out any detail you wish to manifest.
What is TemporalKit?
This is another Stable Diffusion extension that allows you to create custom videos using generative AI. In short, it takes an input video and chops them into dozens (or hundreds) of frames that can then be batch edited with Stable Diffusion, amassing new key frames and sequences which are stitched back together with EbSynth using your new images, resulting a stylized video that was generated and edited based on your Stable Diffusion prompt/workflow.
Join the AI Horde!
A message from the developer of the AI Horde ([email protected]):
The AI Horde is a project I started in order to provide access to Generative AI to everyone in the world, regardless of wealth and resources. The objective is to provide a truly open REST API that anyone is free integrate with for their own software and games and allows people to experiment without requiring online payment that is not always possible for everyone.
It is fully FOSS and relies on people volunteering their idle compute from their PCs. In exchange, you receive more priority for your own generations. We already have close to 100 workers, providing generations from stable diffusion to 70b LLM models!
Also the lemmy community is at [email protected]
If you are interested in democratizing access to Generative AI, consider joining us!
Where to Start?
Unsure where to begin? Do you have no idea what you’re doing? Or have paralysis by analysis? That’s okay, we’ve all been there.
Start small, don’t install everything at once, and instead, ask yourself what sounds like the most fun? Pick one of the tools I’ve mentioned above and spend as much time as you need to get it working. This work takes patience, cultivation, and motion. The first two parts of that (patience, cultivation) typically take the longest to get over.
If you end up at your wit’s end installing or troubleshooting these tools - remind yourself this is bleeding edge artificial intelligence technology. It shouldn’t be easy in these early phases. The good news is I have a strong feeling it will become easier than any of us could imagine over time. If you cannot get something working, consider posting your issue here with information regarding your problem.
To My Esteemed Lurkers…
If you’re a lurker (like I used to be), please consider taking a popcorn break and stepping out of your comfort zone, making a post, and asking questions. This is a safe space to share your results and interests with AI - or make a post about your epic project or goal. All progress is welcome here, all conversations about this tech are fair and waiting to be discussed.
Over the course of this next week I will continue releasing general information to catch this community up to some of its more-established counterparts.
Consider subscribing to [email protected] if you liked the content of this post or want to stay in the loop with Free, Open-Source Artificial Intelligence.
Update #1 [07/04/23]: Come check out this post’s bigger brother, the FOSAI Nexus Resource Hub (v0.0.1)!
Update #2 [07/19/23]: More resources have been added! Contribute to the open-source revolution and turn your hardware into an AI Horde Worker today. Visit the full site here. Shout out to [email protected] for developing (and sharing) this project!
Update #3 [7/20/23]: Come check out our new LLM Guide where you can keep track of all of the latest free and open-source models to hit the space!
Update #4 [7/29/23]: I have officially converted this resource into a website! Bookmark and visit https://www.fosai.xyz/ for more insights and information!
Update #5! [9/22/23]: This guide may be outdated! All GGML
model file formats have been deprecated in place of llama.cpp’s new GGUF
- the new and improved successor to the now legacy GGML
format. Visit TheBloke on HuggingFace to find all kinds of new GGUF
models to choose from. Use interfaces like oobabooga or llama.cpp to run GGUF
models locally. Keep your eye out for more platforms to adopt the new GGUF
format as it gathers traction and popularity. Looking for something new? Check out LM Studio, a new tool for researching and developing open-source large language models. I have also updated our sidebar - double check for anything new there or at FOSAI▲XYZ!.
This is super helpful, thanks a lot! Just what I was looking for!
Glad to have found this. I just discovered all this a few days ago and have been hooked on ai since it is amazing
Greetings from across the Fediverse! We’re happy to have you. I’m excited to see what our future holds. If you want to learn more about AI, consider checking out UnderstandGPT and some of the other partner communities on the sidebar!
Do you have any good resources on tips/tricks for using Stable Diffusion effectively? I’ve got it set up running in a docker container with AUTOMATIC111’s UI, running on my RX 6700xt. I’ve played around for a few hours, but am looking for some good prompt tips, insights into which models are better, what each tool does.
Looking up info on this stuff is kind of hit or miss as to the quality of it, it seems so far. Or it’s intended for an older version or something similar.
Have you had a chance to try ControlNet? There are some some really cool workflows with this.
Stable Diffusion + ControlNet gives you a sense of ‘manual’ control over your AUTOMATIC1111/web-ui workflow.
I quite like it, and I think there’s a ton of potential if you combine it with other processing techniques. Prompts are fickle things, there is no one-size fits all, but hopefully these help you adjust and tune your prompts as you play around with the toolset.
If you’re looking for models, civitai is a good place to try out new styles, checkpoints, LoRAs, and other downloadable content you can use to enhance your Diffusion Suite.
If you’re not sure what ControlNet is, try starting with this video here which goes over the workflow I personally use for some of my own projects.
In the case you’re looking to break into generative video content (based on similar stable diffusion web-ui workflows), you’ll want to check out TemporalKit + EbSynth (which is also detailed in the FOSAI Nexus)!
Beyond the tools you are showing here. What type of hardware do you have at home to run most of these?
Great question. I suggest visiting UnderstandGPT for the full table, but here’s a brief breakdown of current home GPU/VRAM recommendations (as of June 2023):
Model Size
7B
-
Required VRAM (4bit): 6GB
-
Required VRAM (8bit): 10GB
-
Recommended GPU (4bit): GTX 1660, 2060, AMD 5700 XT, RTX 3050, 3060
13B
-
Required VRAM (4bit): 10GB
-
Required VRAM (8bit): 20GB
-
Recommended GPU (4bit): AMD 6900 XT, RTX 2060 12GB, 3060 12GB, 3080, A2000
30B
-
Required VRAM (4bit): 20GB
-
Required VRAM (8bit): 40GB
-
Recommended GPU (4bit): RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100
65B
-
Required VRAM (4bit): 40GB
-
Required VRAM (8bit): 80GB
-
Recommended GPU (4bit): A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000
In terms of CPU requirements, you can run inference on all sorts of hardware. There are people who have been able to run AI/LLM models on their laptops and others with little to no GPU whatsoever. Although for best results, a strong GPU will be important. CUDA cores (NVIDIA specific hardware on all 2xxx/3xxx/4xxx series graphics cards) utilize advanced acceleration algorithms that significantly boost AI performance. An important detail to keep in mind for anyone wanting to run models on NVIDIA cards. You get a boost in that regard.
For my fellow gamers - NVIDIA CUDA cores are what help process your in-game DLSS and RTX (among many other components), settings I’m sure you’ve explored turning on or off to boost your FPS. This is the same tech that gives you an advantage running AI on an NVIDIA GPU at home.
There is not yet an AMD equivalent to CUDA cores, but they have recently partnered with HuggingFace to explore how to offer more competition in this space.
Storage is up to you. Make sure you read file sizes before downloading. I learned that the hard way. Some of these file sizes can easily blow up your hard drive space. Consider dedicating a disk or large folder for all of your AI tinkering and workloads. I personally dedicate a 1TB drive that I use to archive the many models that I experiment with, but it’s overkill for most. You could get away with 128GB/250GB/500GB of storage if you stayed organized. If you plan to only run the small models, 8GB - 24GB should be plenty of room.
For RAM, it’s suggested to have 16GB+, but it’s not as important as GPU + CPU power (a compute combination possible for GGML models - a popular format that let’s you combine the power of both your graphics card and your processor). It’s worth noting RAM might help you load and unload models a little faster, especially so for the larger parameter variants - but the your CPU & GPU are far more important at the moment. In my opinion, 32GB/64GB of RAM is the sweet spot.
If you don’t have access to powerful GPUs you should check out runpod.io and vast.ai. They are great cloud compute platforms that allow you to rent-a-gpu for relatively cheap (typically for a few bucks an hour). Worth looking into if you want to tinker with the larger models, but there are many ways to get access to those whether renting a GPU or trying to get GGML / Quantized model running on your local hardware at home - which is 100% doable if you have at least a 1660 (or newer) graphics card. I haven’t had a lot of time to interact with AMD benchmarks so I’d love to hear how it goes for anyone running one of those cards. I’ll be doing a thorough bench later this month once I finish setting up the server.
What’s great about all of this is that compute for AI is going down for consumer hardware in general. I wouldn’t be surprised if we started to see people running models that have 65B+ parameters somewhat casually before the end of the year.
-
I get great performance with Stable Diffusion (Automatic1111) on my entry level M1 MacBook Air (which, if you don’t know Macs, is an old model of an entry level laptop).
In particular Apple’s modern GPUs have a lot of memory even at the low end.
It generally takes about 20 seconds or so to generate an image (and again, this is a low end fanless ultraportable laptop…)
Text/conversation generation works surprisingly well with CPU only. And it’s possible to split the work between GPU and CPU to achieve a significant speed-up even if you don’t have enough VRAM to fit the whole model. If you don’t have a very powerful CPU you might still be able to get good results with a 7B model, and I think I’ve seen 3B models which I assume require even less resources.
Haven’t played around with stable diffusion in some time, but unless things have changed GPU computing power is much more important for this. When I tried it you needed to be able to fit the entire model in VRAM but maybe it’s possible to split it nowadays. Generating a 512*512 image with my old GTX1080 took about half a minute, but it went down to a few seconds after upgrading to an RTX3080. Exact time requirement will of course depend on which settings you use.
You should make a more advanced version too. Comfy-ui is wonderful but also has a bigger learning curve.
I think several of the listed projects builds on llama.cpp, though it’s also possible to run it directly. The example programs they provide seem a little more bare-bones than other projects, e.g. their “main” program runs in a terminal rather than a fancy web UI. Personally I found llama.cpp running in docker an easy way to get GPU acceleration, since Cuda Toolkit doesn’t support Fedora 38, but I’m just getting started with personal GPTs so I haven’t explored all the options.