• greygore
    link
    fedilink
    arrow-up
    4
    ·
    15 hours ago

    They’re heavily subsidizing the costs to gain users who otherwise probably won’t be interested in the service at a sustainable cost. Every company is hiding their inference costs, but it’s clear that every user is currently burning far more than they’re generating in revenue. The hope is that inference costs will go down, and while that’s a fairly safe bet, there’s two problems:

    1. Frontier model companies are burning cash so fast, they’ll run out long before economies of scale will make the costs affordable.
    2. Even if the per-token inference costs have gone down, almost every technique (thinking, large context windows, etc) to improve AI performance has involved increasing the number of tokens used. Total query cost is easily outpacing any decrease in per-token inference cost.

    Even worse, models themselves are becoming commodities. Although users seem to have preferences for one model over others, there’s still not really a good way to benchmark them. Without a clear ability to differentiate models on performance or ability they’re completely interchangeable, which lowers margins. Why pay more to run company X’s latest and greatest, when company Y’s last generation performs almost identically?

    The reason the web was able to cover costs with advertising is because the cost to serve a web page was minimal. A bit of networking gear and a couple servers was all you needed to serve a large website. For many sites, you didn’t even need premium hardware, just a cheap, basic PC with an Internet connection. Lots of people ran free hobby websites with minimal cost. Hell, you can run a website on a single board computer like a Raspberry Pi.

    By contrast, AI needs huge GPU clusters to respond to a prompt. A four year old H100 GPU will cost around $30,000; typically 8 of those are clustered together in systems that cost more than $300,000. I can’t even find costs for current generation B100 GPUs or B200 clusters, only cloud rentals. Serving an AI model is orders of magnitude more expensive than serving a website.