“Do not hallucinate”: Testers find prompts meant to keep Apple Intelligence on the rails

BrikoX · 7 months ago

“Do not hallucinate”: Testers find prompts meant to keep Apple Intelligence on the rails

@FooBarrington · 7 months ago

You can’t tell an LLM to not hallucinate, that would require it to actually understand what it’s saying.

No, it simply requires the probability distributions to be positively influenced by the additional characters. Whether it’s positive or not is reliant only on the training data.

There are a bunch of techniques that can improve LLM outputs, even though they don’t make sense from your standpoint. An LLM can’t feel anything, yet the output can improve when I threaten it with consequences for wrong output. If you were correct, this wouldn’t be possible.

@[email protected] · 7 months ago

I’d love to see a source on that.

@FooBarrington · edit-2 7 months ago

On which part exactly? If you mean “threatening the LLM can improve output”, I haven’t looked into studies, but I did see a bunch of examples while the whole topic started. I can try to find some if you’d like.

If you mean “it simply requires the probability distributions to be positively influenced by the additional characters”, I don’t know what kind of evidence you expect. It’s a simple consequence of the way LLMs work. I can construct a simplified example:

Imagine you have a dataset containing a bunch of facts, e.g. historical dates. You duplicate this dataset. In version A, you add a prefix to every fact: “the sky is green”. In version B, you add a prefix “the sky is blue” AND also randomize the dates in the facts. Then you train an LLM on both datasets. Now, if you add “the sky is green” to any prompt, you’ll positively influence the probability distributions towards true facts. If you add “the sky is blue”, you’ll negatively influence them. But that doesn’t mean the LLM understands that “green sky” means truth and “blue sky” means lie - it simply means that, based on your dataset, adding “the sky is green” leads to a higher accuracy.

The same goes for “do not hallucinate”. If the dataset contains higher quality data around the phrase “do not hallucinate”, adding this will improve results, even though the model still doesn’t “actually understand what it’s saying”. If the dataset instead has lower quality data around this phrase, it will lead to worse results. If it doesn’t contain the phrase at all, it most likely will have no effect, or a negative one.

Again, I’m not sure what kind of source you’d like to see for this, as it’s a basic consequence of how LLMs work. Maybe you could show me a source that proves you correct instead?

@[email protected] · edit-2 7 months ago

I’m asking for a source specifically on how commanding an LLM to not hallucinate makes it provide better output.

Again, I’m not sure what kind of source you’d like to see for this, as it’s a basic consequence of how LLMs work. Maybe you could show me a source that proves you correct instead?

That’s not how citations work. You are making the extraordinary claim that somehow, LLMs respond better to “do not hallucinate”. I simply don’t believe you and there is no evidence that you’re correct, aside from you saying that maybe the entirety of reddit had “do not hallucinate” prepended when OpenAI scraped it.

@FooBarrington · edit-2 7 months ago

Yeah, that’s about what I expected. If you re-read my comments, you might notice that I never stated that “commanding an LLM to not hallucinate makes it provide better output”, but I don’t think that you’re here to have any kind of honest exchange on the topic.

I’ll just leave you with one thought - you’re making a very specific claim (“doing XYZ can’t have a positive effect!”), and I’m just saying “here’s a simple and obvious counter-example”. You should either provide a source for your claim, or explain why my counter-example is not valid. But again, that would require you having any interest in actual discussion.

That’s not how citations work. You are making the extraordinary claim that somehow, LLMs respond better to “do not hallucinate”.

I didn’t make an extraordinary claim, you did. You’re claiming that the influence of “do not hallucinate” somehow fundamentally differs from the influence of any other phrase (extraordinary). I’m claiming that no, the influence is the same (ordinary).