What would be the cheapest and most cost-effeciant way of self hosting LLMs

adONis · 8 months ago

What would be the cheapest and most cost-effeciant way of self hosting LLMs

@[email protected] · edit-2 8 months ago

I don’t have an answer for you, partly because there isn’t enough information about your aims. However, you can probably work this out yourself, compare prices for different hardware. You’d need some of that missing information to run the numbers, though.

I would imagine that an important input here is your expected usage.

If you just want to set up a box to run a chatbot occasionally and you get maybe 1% utilization of the thing, the costs are different from if you intend to have the thing doing batch-processing jobs 24/7. The GPU is probably the dominant energy consumer in the thing, so if it’s running 24/7, the compute efficiency of the GPU in terms of energy is going to be a lot more important.

If you have that usage figure, you can estimate the electricity consumption of your GPU.

A second factor here, especially if you want interactive use, is what level of performance is acceptable to you. That may, depending upon your budget and use, be the dominant concern. You’ve got a baseline to work with.

If you have those figures – how much performance you want, and what your usage rate is – you can probably estimate and compare various hardware possibilities.

I’d throw a couple of thoughts out there.

First, if what you want is sustained, 24/7 compute, you probably can look at what’s in existing, commercial data centers as a starting point, since people will have similar constraints. If what you care about is much less frequent, it may look different.

Second, if you intend to use this for intermittent LLM use and have the budget and interest in playing games, you may want to make a game-oriented machine. Having a beefy GPU is useful both for running LLMs and playing games. That may differ radically from a build intended just to run LLMs. If you already have a desktop, just sticking a more-powerful GPU in may be the “best” route.

Third, if performance is paramount, depending upon your application, it may be able to make use of multiple GPUs.

Fourth, what applications you want to run may (it sounds like you may have decided on Nvidia already) affect what hardware is acceptable. First, AMD/Nvidia, but also, many applications have minimum VRAM requirements – the size of the model imposes constraints. Have a GPU without enough VRAM to run what you want to run, and you can’t run the model at all.

Fifth, if you have not already, you may want to consider the possibility of not self-hosting at all, if you expect your use to be particularly intermittent and you have high hardware requirements. Something like vast.ai lets you rent hardware with beefy compute cards, which can be cheaper if your demands are intermittent, because the costs are spread across multiple users. If your use is to run a very occasional chatbot and you care a lot about performance and want to run very large models, for example, you could use a system with an H100, for example, for about $3/hour. An H100 costs about $30k and has 80GB of VRAM. If you want to run a chatbot a weekend a month for fun and you want to run a model that requires 80GB – an extreme case – that’s going to be a lot more economical than buying the same hardware yourself.

Sixth, electricity costs where you are are going to be a factor. And if this system is going to be indoors and you live somewhere warm, you can multiply the cost for increased air conditioning load.

adONis · 8 months ago

It would the first scenario you described… i’d just interact with a chatbot occasionally like I do with chatgpt now…but I’d also like to try to experiment with copilot like models to test and use with vscode. So no training of models or 24/7 batch operations.

I was wondering whether a custom built gaming PC is the only solution here or if there are other cjeaper alternatives that get the job decently done