Is anyone actually surprised by this?

  • ayaya
    link
    fedilink
    English
    142 days ago

    This is mildly pedantic but you’re not actually running Deepseek R1, you’re running a 7B version of Qwen that’s been fine-tuned on Deepseek R1 outputs. All of the “distilled” models are existing models trained on R1.

    • @ZeDoTelhado
      link
      -42 days ago

      Nice catch. I’ll be sure after do run the real thing