"We show that a tiny snippet—just 13 words—of retrieved text on a UGC website like Reddit, Wikipedia, Quora, or Facebook can change AI agents to output spam / scam content pretty consistently."
Regular search engines have feedback mechanisms that limit how effective that is. The click through and bounce rate are used to adjust rankings and as more and more people look at the fake info and then ignore it it will naturally fall out of the top results and get buried. LLMs though don’t have that feedback, once something is ingested and baked into the model it’s there forever. The fake info doesn’t need to look believable enough to fool a human, just self consistent enough to fool an LLM with its tiny context window.
Regular search engines have feedback mechanisms that limit how effective that is. The click through and bounce rate are used to adjust rankings and as more and more people look at the fake info and then ignore it it will naturally fall out of the top results and get buried. LLMs though don’t have that feedback, once something is ingested and baked into the model it’s there forever. The fake info doesn’t need to look believable enough to fool a human, just self consistent enough to fool an LLM with its tiny context window.