Lemmy.World
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
Lugh@futurology.todayM to Futurology@futurology.todayEnglish · 11 months ago

Multiple LLMs voting together on content validation catch each other’s mistakes to achieve 95.6% accuracy.

arxiv.org

external-link
message-square
25
link
fedilink
46
external-link

Multiple LLMs voting together on content validation catch each other’s mistakes to achieve 95.6% accuracy.

arxiv.org

Lugh@futurology.todayM to Futurology@futurology.todayEnglish · 11 months ago
message-square
25
link
fedilink
  • dustyData
    link
    fedilink
    English
    arrow-up
    10
    arrow-down
    2
    ·
    11 months ago

    Not a very good, or easy comparison to make. Against the average, sure, the AI is above the average. But a domain expert like a doctor or an accountant is way much more accurate than that. In the 99+% range. Sure, everyone makes mistakes. But when we are good at something, we are really good.

    Anyways this is just a ridiculous amount of effort and energy wasted just to reduce hallucinations to 4.4%.

    • Lugh@futurology.todayOPM
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      4
      ·
      11 months ago

      But a domain expert like a doctor or an accountant is way much more accurate

      Actually, not so.

      If the AI is trained on narrow data sets, then it beats humans. There’s quite a few examples of this recently with different types of medical expertise.

      • dustyData
        link
        fedilink
        English
        arrow-up
        8
        arrow-down
        1
        ·
        edit-2
        11 months ago

        Cool, where are the papers?

        • massive_bereavement@fedia.io
          link
          fedilink
          arrow-up
          10
          ·
          11 months ago

          “We just need to drain a couple of lakes more and I promise bro you’ll see the papers.”

          I work in the field and I’ve seen tons of programs dedicated to use AI on healthcare and except for data analytics (data science) or computer image, everything ends in a nothing-burger with cheese that someone can put on their website and call the press.

          LLMs are not good for decision making (and unless there is a real paradigm shift) they won’t ever be due to their statistical nature.

          The biggest pitfall we have right now is that LLMs are super expensive to train and maintain as a service and companies are pushing them hard promising future features that, by most of the research community they won’t ever reach (as they have plateaued): Will we run out of data? Limits of LLM scaling based on human-generated data Large Language Models: a Survey (2024) No “Zero-Shot” Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

          And for those that don’t want to read papers on a weekend, there was a nice episode of computerphile 'ere: https://youtu.be/dDUC-LqVrPU

          </end of rant>

        • Lugh@futurology.todayOPM
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          5
          ·
          11 months ago

          Large language models surpass human experts in predicting neuroscience results

          A small study found ChatGPT outdid human physicians when assessing medical case histories, even when those doctors were using a chatbot.

          • massive_bereavement@fedia.io
            link
            fedilink
            arrow-up
            6
            ·
            11 months ago

            Are you kidding me? How did NYT reach those conclusions when the chair flipping conclusions of said study quite clearly states that [sic]“The use of an LLM did not significantly enhance diagnostic reasoning performance compared with the availability of only conventional resources.”

            https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2825395

            I mean, c’mon!

            On the Nature one:

            “we constructed a new forward-looking (Fig. 2) benchmark, BrainBench.”

            and

            “Instead, our analyses suggested that LLMs discovered the fundamental patterns that underlie neuroscience studies, which enabled LLMs to predict the outcomes of studies that were novel to them.”

            and

            “We found that LLMs outperform human experts on BrainBench”

            Is in reality saying: we made this benchmark that LLMs know how to cheat around our benchmark better than experts do, nothing more, nothing else.

      • BluesF
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        1
        ·
        11 months ago

        Specialized ML models yes, not LLMs to my knowledge, but happy to be proved wrong.

    • ogmios@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      4
      ·
      edit-2
      3 months ago

      deleted by creator

Futurology@futurology.today

futurology@futurology.today

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]
Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 70 users / day
  • 119 users / week
  • 615 users / month
  • 3.97K users / 6 months
  • 733 local subscribers
  • 3.35K subscribers
  • 2.02K Posts
  • 11.5K Comments
  • Modlog
  • mods:
  • voidx@futurology.today
  • Lugh@futurology.today
  • Espiritdescali@futurology.today
  • AwesomeLowlander@futurology.today
  • UI: 0.19.12-3-gc6677485
  • BE: 0.19.12-4-gd8445881a
  • Modlog
  • Legal
  • Instances
  • Docs
  • Code
  • join-lemmy.org