OpenAI Researchers Find That Even the Best AI Is "Unable To Solve the Majority" of Coding Problems

Sundray · 5 hours ago

OpenAI Researchers Find That Even the Best AI Is "Unable To Solve the Majority" of Coding Problems

@[email protected] · 5 hours ago

I’ve been playing with Copilot in vscode and it’s becoming more and more clear that it’s just copy and pasting shit from stack overflow. Which I’ve been doing for years without AI.

@[email protected] · 4 hours ago

Copilot is awful. It is clearly optimized to be able to be cost-effective while still running thousands of queries a day literally every time you touch the keyboard (which doesn’t mean they aren’t losing money hand over fist about it, just not as much as they would be).

Just pay your $20/month to claude.ai, and copy and paste code back and forth with it instead. It still can’t really understand, or work on problems above a certain size, but it is at least fairly capable within the limits of what modern LLMs can do. It also has fairly nice features like being able to upload big chunks of code to a project as the “context” for what you were working on, and then having all chats be within the frame of reference of that context, and it actually works.

Of course, Anthropic will now hold all your employer’s code which depending on what you’re working on and how you feel about your employer might not be ideal. But that was true anyway.

imhuman · 4 hours ago

@PhilipTheBucket @Omgboom my experience with Claude lasted 1 minute: “I can’t search the internet”
ok, next!

@[email protected] · 3 hours ago

What were you asking it?

mesamune · 5 hours ago

As a software dev, the thing that llms provide is an easy way to get started. For the really simple stuff, it’s 90% correct so worth it if your saving time. It can make you a simple hello world, a form, heck even a good rest API.

BUT you MUST take a look at what it’s creating. It will hallucinate aka lie to get you an answer. And context is lost on it. And most models are trained on really old PUBLIC data. That means any very specific knowledge that may be industry standard that is not necessarily in the model when it was trained. It’s going to make mistakes much worse than a jr dev. You also get the issue of maintaining that code it generated. It’s going to look like a hack to be honest.

It’s a great tool to get you started and maybe save you time, but it’s just a tool in the tool belt.

@TrickDacy · 1 hour ago

I think copilot is a great tool to use as an auto complete that you check. Saves me typing and remembering syntax I know is right when I see it. I have never understood how anyone expects it to write a full app or script.

Optional · 4 hours ago

That’ll be $200 Billion please.

mesamune · 4 hours ago

You joke but, while LLMs are a money maker, the real money will come from those who can provide up to date info! Like Lexas Nexas or other data brokers. The biggest issue for these LLMs is that their training data is no where near what it needs to be. And its quite obvious they only trained on non-corporate public data + whatever slop they could get from reddit.

The biggest issue isnt the quantity of data, its the quality! Ironic because they are literally flooding the internet with slightly more wrong detail on how to do things.

queermunist she/her · 4 hours ago

A money maker for who? My understanding is none of these companies have turned a profit on their models.

mesamune · 3 hours ago

Petty sure the data brokers have lol.

It’s still BS of course.

Optional · 3 hours ago

And it’s the thing they can never have, because they don’t understand words. And they never will.

Every answer the ever give will need to be checked by a human. It won’t be, of course, but that’s why we’ll have decades of fun with messed up AI slop getting into actual communications where we don’t want them to.

@Valmond · 3 hours ago

No shit.

It’s rare that I code the same thing twice, except basic stuff like opening a file or recursively get all files or something.

AI is good for that, but when you have to figure things out, well, it’s not every day a manager or client can even explain what they need, or a complicated bug is easy to fix.

I think software devs have a couple of years left if shelf life.