David GerardM to

[email protected]English • 6 months ago

come see all the popular super-duper-autocomplete systems failing hard at really simple reasoning questions and babbling nonsense from latent space!

47

come see all the popular super-duper-autocomplete systems failing hard at really simple reasoning questions and babbling nonsense from latent space!

David GerardM to

[email protected]English • 6 months ago

Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models

Chat

@[email protected]
link
fedilink
English
9•6 months ago
This all but confirms that all those benchmark evals are in the training set right?
- David GerardOPM
  link
  fedilink
  English
  13•6 months ago
  Some forms are - but many are not! The fun stuff is in Appendix 2, the responses.