• @[email protected]
    link
    fedilink
    English
    82 months ago

    It sounds a lot like this quote from Andrej Karpathy :

    Turns out that LLMs learn a lot better and faster from educational content as well. This is partly because the average Common Crawl article (internet pages) is not of very high value and distracts the training, packing in too much irrelevant information. The average webpage on the internet is so random and terrible it’s not even clear how prior LLMs learn anything at all.

    • @vxx
      link
      English
      3
      edit-2
      2 months ago

      So it will end in a downward spiral because it starts learning from AI articles, from which articles are being written, from which the AI learns, from which articles are being written …

      • @[email protected]
        link
        fedilink
        English
        12 months ago

        As long as there’s supervision during training, which there always will be, this isn’t really a problem. This just shows how bad it can get if you just train on generated stuff.

        • @vxx
          link
          English
          32 months ago

          which there always will be

          How? We just learned that they train on social media.