• @TangledHyphae
    link
    English
    19 months ago

    I’m betting the truth is somewhere in between, models are only as good as their training data – so over time if they prune out the bad key/value pairs to increase overall quality and accuracy it should improve vastly improve every model in theory. But the sheer size of the datasets they’re using now is 1 trillion+ tokens for the larger models. Microsoft (ugh, I know) is experimenting with the “Phi 2” model which uses significantly less data to train, but focuses primarily on the quality of the dataset itself to have a 2.7 B model compete with a 7B-parameter model.

    https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/

    In complex benchmarks Phi-2 matches or outperforms models up to 25x larger, thanks to new innovations in model scaling and training data curation.

    This is likely where these models are heading to prune out superfluous, and outright incorrect training data.