Is there a simple way to severly impede webscraping and LLM data collection of my website?

@Maroon · edit-2 8 months ago

Is there a simple way to severly impede webscraping and LLM data collection of my website?

@IphtashuFitz · 9 months ago

Try using “curl -A” to specify a User-Agent string that matches Chrome or Firefox.

@corroded · 9 months ago

I probably should have specified I’m using libcurl, but I did try the equivalent of what you suggested. I even tried setting a list of user agents and having it cycle through. None of them work. A lot of anti-scraping methods use much more complex schemes than just validating the user agent. In some cases, even a headless browser will be blocked.