LLMs performed best on questions related to legal systems and social complexity, but they struggled significantly with topics such as discrimination and social mobility.

“The main takeaway from this study is that LLMs, while impressive, still lack the depth of understanding required for advanced history,” said del Rio-Chanona. “They’re great for basic facts, but when it comes to more nuanced, PhD-level historical inquiry, they’re not yet up to the task.”

Among the tested models, GPT-4 Turbo ranked highest with 46% accuracy, while Llama-3.1-8B scored the lowest at 33.6%.

  • @Epzillon
    link
    English
    111 hours ago

    I just like the analogy of a dashboard with knobs. Input text on one wide output text on the other. “Training” AI is simply letting the knobs adjust themselves based on feedback of the output. AI never “learns” it only produces output based on how the knobs are dialed in. Its not a magic box, its just a lot of settings converting data to new data.

    • @[email protected]
      link
      fedilink
      English
      29 hours ago

      Do you think real “understanding” is a magic process? Why would LLMs have to be “magic” in order to understand things?