Large language models can do jaw-dropping things. But nobody knows exactly why.

@[email protected] · 7 months ago

Large language models can do jaw-dropping things. But nobody knows exactly why.

lad · 7 months ago

[Alicia Curth’s] team argued that the double-descent phenomenon—where models appear to perform better, then worse, and then better again as they get bigger—arises because of the way the complexity of the models was measured.

Like with emergent abilities, that also were not emerging abruptly if you use a different metric.

A lot of things boil down to measuring them right, unfortunately this is an unsolved problem in general ¯\_(ツ)_/¯