• @IndustryStandard
    link
    English
    15 days ago

    Deepseek R1 is currently the selfhosting model to use

    • @brucethemoose
      link
      English
      15 days ago

      Some of the distillations are trained on top of Qwen 2.5.

      And for some cases, FuseAI (a special merge of several thinking models), Qwen Coder, EVA-Gutenberg Qwen, or some other specialized models do a better job than Deepseek 32B in certain niches.