Agentic coding from Galapagos Island (what is the appeal to this?)

HaraldvonBlauzahn@feddit.org · edit-2 1 day ago

Agentic coding from Galapagos Island (what is the appeal to this?)

x1gma · 1 day ago

Naturally, this code didn’t have tests

Codebase with no tests, check.

it was a UI interaction bug for which I’m not even really qualified to write a test for

What the hell are they doing in bugfixing an UI bug, when they are “not qualified” to write a test for it. Anyhow, not competent enough for the codebase you’re working on - check.

so I asked Codex to bisect between dates X and Y to find the commit that introduced this bug.

So, instead of asking the LLM to e.g. create a proper reproduction as a test case, asking it to bisect, which the author claimed that I wasn’t possible, for some reason. So, also adding can’t bisect on his own, and can’t prompt properly, check and check.

[Waffling about hallucinations] I then asked it to show me by making a video with the full developer end-to-end stack in the normal browser test environment. […] The video made it look like Codex had reproduced the bug, but it was an artificial browser environment that was designed to create a fake repro, not the real environment.

So, the author realized it hallucinates. The author asks for video proof (instead of a fucking test, again). The author is surprised it generated him a video of exactly what they wanted to see, only creating it in a different way than they wanted to.

This reads like “I have close to zero clue what I’m doing, I also don’t really know how to achieve what I want properly, and now I’m making a salty blog post that my magical text microwave didn’t fix my half-assed description of a problem”. Like, honestly, what the hell was the expectation here?

Eager Eagle · 1 day ago

This. If you’re at a point that you’re arguing with an LLM, you’ve already lost. Just start a new thread with a different approach, don’t make an article about your inability to use an LLM.

HaraldvonBlauzahn@feddit.org · edit-2 1 day ago

What the hell are they doing in bugfixing an UI bug, when they are “not qualified” to write a test for it. Anyhow, not competent enough for the codebase you’re working on - check.

Does the name “Dan Luu” say anything to you? Do you know his blog ?

In general, for Dan Luu I wouldn’t assume he is not competent enough.

And besides that, what is the point of LLMs / GenAI if you need to be an expert in everything it touches to handle it correctly? If you are an expert, you can already do it yourself.

Also, if one needs to be an expert in every topic to get good or even acceptable results, this creates more doubt that the “intelligence”, “reasoning”, and “capabilities” of these things are in reality the intelligence of the user, since he does the real work of discerning fabrication and accidental good output.

Reminds me on that old story of the smart horse “Hans” which could do math, indicating the result with is hooves. But it turned out he could do it only when his owner was around - the horse had learned when his owner agreed with the result and indicated that unconciously.

x1gma · 1 day ago

I’m not assuming he’s not competent, and I’ve looked him up - he’s by no means incompetent. But he himself said he’s not qualified to write tests for that. If you cannot write tests for whatever you’re doing, you shouldn’t be doing that. Someone with his knowledge, or at least the knowledge he should have given his CV, should know that. In this specific case he is incompetent, because what he’s doing is simply wrong on every level.

You don’t need to be an expert on what you’re doing to use LLMs efficiently. You can also have solid prompts and ideas to use a LLM to cancel out your personal lack of knowledge in a specific domain. In any case, expecting LLMs to produce correct output when you’re actively guiding it to do something wrong is simply stupid.

Any claim of actual intelligence in a LLM is simply not true. Never been, never will be. Artificial intelligence is an umbrella term for ANI, AGI and ASI, artificial narrow, general and super intelligence respectively. A narrow intelligence is not even close to human intelligence, and is hyper-specialized in a single task. All and any LLMs are and always will be ANIs, and their hyper-specialization is basically a stochastic word (well, token) completion on steroids. An AGI is mostly defined as “close to” or “approaching” human intelligence, as in general knowledge and transfer of it into unrelated fields.

This, reasoning and capabilities will help you nothing when you guide it in the wrong direction. You need to keep in mind the absolutely mind blowing amount of money involved around LLMs. The bubble is too big to fail. Any LLM is a product, and their first and foremost goal is to make you use it, so you pay for it - therefore the primary directive of the AI is to give you what you ordered, to glaze you, and to be your best, obedient buddy. You want a video of the bug, of course! Here you have a video of how that bug looks like - stochastically that’s the answer to the prompt.

Agentic coding from Galapagos Island (what is the appeal to this?)

Agentic coding from Galapagos Island (what is the appeal to this?)

Agentic test processes, LLM benchmarks, and other notes on agentic coding from Galapagos Island