Source report:

“DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts” by SemiAnalysis

DeepSeek took the world by storm. For the last week, DeepSeek has been the only topic that anyone in the world wants to talk about. As it currently stands, DeepSeek daily traffic is now much higher than Claude, Perplexity, and even Gemini.

But to close watchers of the space, this is not exactly “new” news. We have been talking about DeepSeek for months (each link is an example). The company is not new, but the obsessive hype is. SemiAnalysis has long maintained that DeepSeek is extremely talented and the broader public in the United States has not cared. When the world finally paid attention, it did so in an obsessive hype that doesn’t reflect reality.

We want to highlight that the narrative has flipped from last month, when scaling laws were broken, we dispelled this myth, now algorithmic improvement is too fast and this too is somehow bad for Nvidia and GPUs.

  • Alphane MoonOPM
    link
    English
    222 hours ago

    While the full report requires a subscription, they do have a section titled “DeepSeek subsidized inference margins”.

    This is from the intro to that section:

    MLA is a key innovation responsible for a significant reduction in the inference price for DeepSeek. The reason is MLA reduces the amount of KV Cache required per query by about 93.3% versus standard attention. KV Cache is a memory mechanism in transformer models that stores data representing the context of the conversation, reducing unnecessary computation.

    As discussed in our scaling laws article, KV Cache grows as the context of a conversation grows, and creates considerable memory constraints. Drastically decreasing the amount of KV Cache required per query decreases the amount of hardware needed per query, which decreases the cost. However we think DeepSeek is providing inference at cost to gain market share, and not actually making any money. Google Gemini Flash 2 Thinking remains cheaper, and Google is unlikely to be offering that at cost. MLA specifically caught the eyes of many leading US labs. MLA was released in DeepSeek V2, released in May 2024. DeepSeek has also enjoyed more efficiencies for inference workloads with the H20, due to higher memory bandwidth and capacity compared to the H100. They have also announced partnerships with Huawei but very little has been done with Ascend compute so far.

    It seems that at least some LLM models from Google offer lower inference cost (while likely not being subsidized).

    • @anyhow2503
      link
      521 hours ago

      However we think

      The times where I have trusted what tomshardware thinks are long gone.

        • @[email protected]
          link
          fedilink
          519 hours ago

          All of the writers are long on NVDA, i don’t trust any analysis that doesn’t start with disclaiming that conflict of interest.

          Also, this is time of my life I’ll never get back. They literally use semianalysis as the source for their references. Where are the outside references. There’s a reason self citation is frowned upon in science. Massive sour grapes energy from NVDA holders.

          • Alphane MoonOPM
            link
            English
            1
            edit-2
            19 hours ago

            I am not making any judgment call regarding SemiAnalysis or the validity of their report. I did say "It seems that at least some LLM models from Google offer lower inference cost (while likely not being subsidized).