• BrikoXM
    link
    fedilink
    English
    323 hours ago

    It was a nice sleight of hand on their part. There is a lot of misleading information about all of it since they only release pre-training details on DeepSeek-V3 model, but not DeepSeek-R1. But the media reported on it as it was one and the same without any distinction.

    Based on reports, the parent company had access to more GPUs than reported amount used. Hard to tell if they were utilized though.

    • @[email protected]
      link
      fedilink
      English
      422 hours ago

      Yeah, whatever the case, They were all trained on data from the public. The very least they can do is make the models available to the public.