You must log in or register to comment.
Great article. Other discussions on AI training consistently discuss how data collected now from social media might be poisoned and can’t inherently be trusted with all the new chatbots and that RLHF will need to be used making it that much more expensive and difficult. The final line of this article puts the problem of data poisoning into full perspective.
I never thought about it like that, but you’re right on, the data quality matters. I saw discussion on another board how all of the Reddit data that we use in our searches might become extremely valuable since was majority genuine human.
Of course, obligatory fuck u/spez for his handling of what we all created, but there’s no reason we can’t do it again here.