• @[email protected]
    link
    fedilink
    96 months ago

    Too bad those posts are mostly screenshots. I think they only use text-based posts and comments to train the “AI”.

    • @Delphia
      link
      36 months ago

      Yeah but the comments were usually kind of a shitshow.

    • @[email protected]
      link
      fedilink
      16 months ago

      They probably also do some OCR on that and then let something other run over that to see if the text makes sense (basically letting another AI grade the output, commonly done to judge what’s a good dataset and what isn’t) and then just feed the ai again. Today you have a shortage of data since the internet is too small (yes I know it sounds crazy) so I wouldn’t wonder if they actually tried to use pictures and ocr to gather a bit more usable data