Top physicist says chatbots are just ‘glorified tape recorders’

L4sBot · 2 years ago

Top physicist says chatbots are just ‘glorified tape recorders’

@joe · 2 years ago

My opinion is that a good indication that LLMs are groundbreaking is that it takes considerable research to understand why they give the output they give. And that research could be for just one prediction of one word.

@[email protected] · 2 years ago

For me, it’s the next major milestone in what’s been a roughly decade-ish trend of research, and the groundbreaking part is how rapidly it accelerated. We saw a similar boom in 2012-2018, and now it’s just accelerating.

Before 2011/2012, if your network was too deep, too many layers, it would just breakdown and give pretty random results - it couldn’t learn - so they had to perform relatively simple tasks. Then a few techniques were developed that enabled deep learning, the ability to really stretch the amount of patterns a network could learn if given enough data. Suddenly, things that were jokes in computer science became reality. The move from deep networks to 95% image recognition ability, for example, took about 1 years to halve the error rate, about 5 years to go from about 35-40% incorrect classification to 5%. That’s the same stuff that powered all the hype around AI beating Go champions and professional Starcraft players.

The Transformer (the T in GPT) came out in 2017, around the peak of the deep learning boom. In 2 years, GPT-2 was released, and while it’s funny to look back on now, it practically revolutionized temporal data coherence and showed that throwing lots of data at this architecture didn’t break it, like previous ones had. Then they kept throwing more and more and more data, and it kept going and improving. With GPT-3 about a year later, like in 2012, we saw an immediate spike in previously impossible challenges being destroyed, and seemingly they haven’t degraded with more data yet. While it’s unsustainable, it’s the same kind of puzzle piece that pushed deep learning into the forefront in 2012, and the same concepts are being applied to different domains like image generation, which has also seen massive boosts thanks in-part to the 2017 research.

Anyways, small rant, but yeah - it’s hype lies in its historical context, for me. The chat bot is an incredible demonstration of the incredible underlying advancements to data processing that were made in the past decade, and if working out patterns from massive quantities of data is a pointless endeavour I have sad news for all folks with brains.

Anduin1357 · 2 years ago

Do you have any further reading on this topic? This has been such an amazing read.

@[email protected] · edit-2 2 years ago

Hmm… Nothing off the top of my head right now. I checked out the Wikipedia page for Deep Learning and it’s not bad, but quite a bit of technical info and jumping around the timeline, though it does go all the way back to the 1920’s with it’s history as jumping off points. Most of what I know came from grad school and having researched creative AI around 2015-2019, and being a bit obsessed with it growing up before and during my undergrad.

If I were to pitch some key notes, the page details lots of the cool networks that dominated in the 60’s-2000’s, but it’s worth noting that there were lots of competing models besides neural nets at the time. Then 2011, two things happened at right about the same time: The ReLU (a simple way to help preserve data through many layers, increasing complexity) which, while established in the 60’s, only swept everything for deep learning in 2011, and majorly, Nvidia’s cheap graphics cards with parallel processing and CUDA that were found to majorly boost efficiency of running networks.

I found a few links with some cool perspectives: Nvidia post with some technical details

Solid and simplified timeline with lots of great details

It does exclude a few of the big popular culture events, like Watson on Jeopardy in 2011. To me it’s fascinating because Watson’s architecture was an absolute mess by today’s standards, over 100 different algorithms working in conjunction, mixing tons of techniques together to get a pretty specifically tuned question and answer machine. It took 2880 CPU cores to run, and it could win about 70% of the time at Jeopardy. Compare that to today’s GPT, which while ChatGPT requires way more massive amounts of processing power to run, have an otherwise elegant structure and I can run awfully competent ones on a $400 graphics card. I was actually in a gap year waiting to go to my undergrad to study AI and robotics during the Watson craze, so seeing it and then seeing the 2012 big bang was wild.