• AutoTL;DRB
    link
    fedilink
    English
    17 months ago

    This is the best summary I could come up with:


    Researchers are ringing the alarm bells, warning that companies like OpenAI and Google are rapidly running out of human-written training data for their AI models.

    It’s an existential threat for AI tools that rely on feasting on copious amounts of data, which has often indiscriminately been pulled from publicly available archives online.

    The controversial trend has already led to publishers, including the New York Times, suing OpenAI over copyright infringement for using their material to train AI models.

    The latest paper, authored by researchers at San Francisco-based think tank Epoch, suggests that the sheer amount of text data AI models are being trained on is growing roughly 2.5 times a year.

    Extrapolated on a graph, that means large language models like Meta’s Llama 3 or OpenAI’s GPT-4 could entirely run out of fresh data as soon as 2026, the researchers argue.

    In a paper last year, scientists at Rice and Stanford University found that feeding their models AI-generated content causes their output quality to erode.


    The original article contains 476 words, the summary contains 164 words. Saved 66%. I’m a bot and I’m open source!