You know how Google’s new feature called AI Overviews is prone to spitting out wildly incorrect answers to search queries? In one instance, AI Overviews told a user to use glue on pizza to make sure the cheese won’t slide off (pssst…please don’t do this.)

Well, according to an interview at The Vergewith Google CEO Sundar Pichai published earlier this week, just before criticism of the outputs really took off, these “hallucinations” are an “inherent feature” of  AI large language models (LLM), which is what drives AI Overviews, and this feature “is still an unsolved problem.”

  • @[email protected]
    link
    fedilink
    English
    106 months ago

    It’s quite simple. Garbage in, garbage out. Data they use for training needs to be curated. How to curate the entire internet, I have no clue.

    • @[email protected]
      link
      fedilink
      English
      96 months ago

      The real answer would be “don’t”. Have a decent whitelist dor training data with reliable data. Don’t just add every orifice of the internet (like reddit) to the training data. Limitations would be good in this case.

      • @CheeseNoodle
        link
        English
        76 months ago

        Its worse than reddit, they’ve been pulling data from the onion.

          • @CheeseNoodle
            link
            English
            36 months ago

            Its been quoting some onion articles verbatim, so either they pulled from the onion directly or from somewhere that re-posts onion articles.

      • @Agent641
        link
        English
        36 months ago

        Just train it on linux help forum replies, because everyone there is always 100% right.

      • @[email protected]
        link
        fedilink
        English
        26 months ago

        Having a curated whitelist would definitely be a good idea, but if it only shows information from a limited list of websites, that would make it a terrible search engine incapable of searching most of the web.

    • @woelkchen
      link
      English
      46 months ago

      They already have a curated data set. It’s called Google Scholar.