• b3nsn0w
    link
    fedilink
    English
    82 years ago

    yeah, fediverse platforms not only have no measures against scraping, they willingly send out content in a computer-readable way. kind of the whole point of federation. and we can’t really stop them, even if we clamp down on federation we’d only hurt ourselves.

    besides, up until the latest change twitter was still easy to scrape (and now the problem is that even registered users can’t see that much of it), and reddit is trivial to scrape even without the api. yes, that includes new reddit too. there’s very little you can do against scraping in an open space, especially against someone wielding the full power of chatgpt, and even less so if you want to keep your site accessible to blind people.

    • @[email protected]
      link
      fedilink
      English
      32 years ago

      (and now the problem is that even registered users can’t see that much of it)

      People actually already found a way around the rate limit. Opera GX even implemented a fix in their desktop browser.

      • b3nsn0w
        link
        fedilink
        English
        32 years ago

        lmao, you know you fucked up when a browser pushes an update specifically to circumvent your rate limits

        but yeah, if opera can do it, i highly doubt that openai can’t easily do it either. the ai concerns are posturing (and probably a personal grudge, given that elon was a founding member of openai until he got kicked out), the real issue is somewhere between incompetence and attempted monetization.

        • @sauerkraus
          link
          English
          22 years ago

          For Reddit API calls are near infinitely less load on the servers than scraping.

          • b3nsn0w
            link
            fedilink
            English
            12 years ago

            i’m actually kinda interested how that could work. a regular user using “near infinitely less” resources than a scraping engine sounds like some absolutely stupid design, either on reddit’s or the scraping engine’s side

            • @sauerkraus
              link
              English
              22 years ago

              When using the API you just request what you’re looking for. With scraping you load everything repeatedly.

              • b3nsn0w
                link
                fedilink
                English
                12 years ago

                except most of the weight of the site is in easily cachable assets that don’t get reloaded at all. probably not even loaded to begin with, since even though new reddit is a single-page app, it does have seed data in the html content itself, which a well-written scraper (or one that automatically parses the site with chatgpt) can easily extract. constantly reloading styles and scripts would be a ridiculously stupid design on the scraper’s part, and on reddit’s if they necessitated it.

                the html page itself is slightly heavier than just the json data but compared to all the images and videos real clients load and the giant piles of tracking data being sent back every second, a scraper is def going to be lighter. plus the site does reload itself every time you enter a new subreddit, that doesn’t happen through the api for some reason.