scraperule

@[email protected] · 1 year ago

scraperule

@[email protected] · 1 year ago

i tried to get access to facebook’s api to mess around (as a student) but they declined my request. i ended up making a bot that ran in a headless browser wasting far more of facebooks resources and i used it to create shitposts that updated the post with the number of reactions lmao.

b3nsn0w · 1 year ago

fun fact: on the r-site, you can still append .json to the end of any path (before the query params) to get the formatted data

fun fact 2: on the same site you get a similar json if you grab the script that says id="data" (trivial with jsdom if you run nodejs), eval it in a sandbox (node’s built-in vm package), and look for your passed global object’s $.___r param

fun fact 3: also on the same site, if you use the old interface it’s full of data tags intended for css, jsdom goes brrr

fun fact 4: even if they stopped all of this you could use a headless browser and grab the data in flight from the api calls (virgin dom scrubber vs chad api capturer)

i don’t know much about the t-site and can’t check right now because you can’t even access it the normal way, lol

CreamCake · 1 year ago

Scraping my beloved…using more resources from a company’s server makes me drool

@[email protected] · 1 year ago

This cracked me up. Especially the 10 minute delay and rate limiting making it better to just scrape.

@Jackolantern · 1 year ago

Can someone eli5 me. What’s scraping and how does it work? Like for example in the context of twitter with their current limitations, will scraping still work?

@[email protected] · 1 year ago

Scraping is getting a webpage as if you’re a normal user going to that page in firefox/chrome and extracting the bits you want from it. If Twitter makes you sign in to view tweets (which I guess it will now?) then scraping won’t help much, otherwise it probably will, however it may take a fair bit of trickery to get working