• @[email protected]
    link
    fedilink
    English
    39 months ago

    As far as I know, that is mainly used where a better, bigger model generates training data for a more efficient smaller model to bring it a bit closer to its level.

    Were there any cases of an already state of the art model using this method to improve itself?