Does Google uses Google Chrome users to discover new unindexed pages?

@[email protected] · edit-2 3 months ago

Does Google uses Google Chrome users to discover new unindexed pages?

@solrize · edit-2 3 months ago

I had some private pages a while back that linked to unrelated pages on other sites. I had to go somewhat crazy to stop the private urls from leaking to the external sites through referer headers when my users clicked on the links.

If chrome is sending people’s browser histories to Google that is invasive.

@[email protected] · 3 months ago

So how did you stop the referer header from doing that. I’d imagine it to be a clear simple command since it ought to be. Or was it not that straightforward?

@solrize · 3 months ago

It’s easier now that there are some control headers for it. At the time I tried a lot of things like bouncing through javascript opening a new window. Results varied by browser. The simplest way was to inconvenience users a bit by supplying text urls for them to paste into the nav bar, instead of clickable links.

@[email protected] · 3 months ago

100% if you have enabled “Safe browsing” (which is enabled by default). This also applies to Firefox, but I don’t know if there is enabled by default.

@[email protected] · 3 months ago

That makes perfect sense since Google Chrome has safe search enabled by default and most people don’t bother about changing their settings.

@HopesBeyondTheSky · edit-2 3 months ago

deleted by creator

@[email protected] · 3 months ago

Do any of the pages in the directory link to other websites? It could be that if you link to a website that is using Google analytics, it may see that referrer header when the person using chrome opened the link. If it knew that your site didn’t have links to the third party site before, maybe that triggered a refresh.

You could test this by making a page linking to CNN or another site which is using Google analytics, and using Firefox (without anything that would block Google Analytics) and click on the link on your site to the other site. if the Google bot checks your site within 10 seconds then you could rule out chrome as the culprit.

@[email protected] · 3 months ago

Nope, is just a file indexer that I host publicly. I don’t care about sharing the URL to provide more context.

The user accesed https://luna.nadeko.net/Movies/Ch3k0p3t3/ with Google Chrome

And 10 seconds after, Googlebot scrapes the folder.

Simple as that, I don’t have privacy invasive trackers on any of my webpages/services

The Octonaut · 3 months ago

Are you using Google’s DNS?

Pup Biru · 3 months ago

DNS will only leak domains (and subdomains); not paths

@[email protected] · 3 months ago

DNS doesn’t affect at all in this case