• @jacksilver
    link
    English
    11 month ago

    My point was a mixture of Experts model could suffer from generalization. Although in reading more I’m not sure if it’s the newer R model that had the MoE element.