• @frankenswine
    link
    71 month ago

    you miss the point: instance owners have quite a lot more information on their user’s activities than what’s public.

    or would you argue that reddit does not aggregate data because it’s all public?

    • Saik0
      link
      fedilink
      English
      7
      edit-2
      1 month ago

      instance owners have quite a lot more information on their user’s activities

      Not really. Only thing additional that could be identified is browsing patterns while on the site itself. I don’t think it’s that valuable. You likely already gave up what you’re likely to see by commenting in communities. That’s going to be tracked best through a proxy or something, not lemmy itself. And can even be tracked externally through other means. Ex: This post has a tracking image on it and because you need to connect to me to load it I now see everyone that had loaded this comment. So this can be done externally without even being an instance owner. Click view source to see it at the end of the post.

      Votes are federated, kbin instances see them as “likes” publicly. Messages are federated, sent in clear text. And posts that are loaded can be tracked via other means… Think of sites that display ads… They do this exact thing and collect information by the boatload because they can inject on every page that shows an ad. Without needing to be an admin on the site itself.

      Edit: In theory someone could canvas/comment on every post with a bot and embed tracking images everywhere. Rotate usernames doing it from different servers and rotate through domains that are all cnamed back to the same tracking node and you could attack the whole fediverse with this type of tracking. Probably already being done… But it would be visible in that we have the ability to check source of each comment. But who the hell is going to take the time to do that?

      Edit2: Here’s example of what was collected with that embedded image. Keep in mind that this type of tracking can happen with REAL images as well, making it impossible to track. And I’m specifically not tracking much of anything. But things like IP address used to access is on the backend. There’s also Browser, OS, referrers… etc…

      • @frankenswine
        link
        21 month ago

        you are (still) missing my point - but i might be wrong as well (i am mot too familiar with ActivityPub).

        my point is not that my public posts are in fact public and can be (and probably are) mined through unknown parties, but that instance owners have even more, probably more valuable info, like IP addresses from which not just geolocation but also wake times, device usage patterns and other gnarly stuff could be extracted, that could - together with other personalized surveillance info (like the usual adware stuff) - be aggregated to give a bigger picture.

        just showing (as you did) that one can get some info about me through my (public) actions does not refute the point that instance owners have access to more, not-so-public information

        • Saik0
          link
          fedilink
          English
          01 month ago

          but that instance owners have even more, probably more valuable info, like IP addresses from which not just geolocation but also wake times, device usage patterns and other gnarly stuff could be extracted, that could - together with other personalized surveillance info (like the usual adware stuff) - be aggregated to give a bigger picture.

          I have IP behind the geolocation. How do you think that I know the geolocation? It’s an IP lookup. My interface that I shown in the image just doesn’t publish it because I don’t care personally. What I use that service for is simply to track where sensitive emails/documents go. Not to track lemmy. I don’t need specific resolutions. Just to know if they leak outside of what I expected.

          Device patterns? The app you use is the app you use. That would be given away via your browser header. I also collect that with the tracking image. Just once again. Not shown in the graph cause I don’t care to track it personally (I’m only doing this as an example, not to actually aggregate data).

          If you use lemmy over the web browser, browsers don’t really give up that much information unless you’re google themselves. In which case apparently chrome gives up a boatload of information to google’s domains.

          not-so-public information

          You’d have to give me an example of any of what you’re referencing. I can collect IP, web headers, access times, and if I tag enough pages or mark the image as non-cacheable could even see multiple views/accesses (you see views higher than actual visitors) I can track your movement across all of the fediverse.

          that one can get some info about me through my (public) actions

          Simply “viewing” the page (which pulls the image and is not necessarily “public”) is a direct rebuttal to obtaining data that isn’t “public”.

            • Saik0
              link
              fedilink
              English
              01 month ago

              I’ve addressed the points you’ve brought up. I run my own instance. I can collect just about everything in the DB tables I’ve seen without being logged into the instance with some external work.

              Are you trying to get my point? If you have a specific item that you believe is stored on a lemmy server that you think isn’t possible to obtain. I’m all ears. otherwise I think this conversation is done. This kind of response is pointless and I’m not interested in continuing if you’re going to act like that.

              The hardest thing to collect would be private messages, and login information (which is hashed btw, so even your server operator doesn’t really know it). But messages are plaintext and openly federated. All the other information is really really easy to collect through other means.

              • @frankenswine
                link
                11 month ago

                First if all: my Not Sure Fry was intended as a joke.

                So, just to understand you correctly:

                I can collect just about everything in the DB tables I’ve seen without being logged into the instance with some external work. Can you see which communities I follow? Which feeds I watch (and when I do that)? Who I interact with through DMs?

                • Saik0
                  link
                  fedilink
                  English
                  01 month ago

                  Can you see which communities I follow?

                  Wouldn’t need to see it directly. If someone was to tag enough posts they could deduce it over time. Eg, I could post on every community on every lemmy in the fediverse and over time I can be reasonably sure which communities you follow as you’d see these post in your feed and tracking images would populate your view of them as you scrolled. Would take very little automation to do it.

                  Which feeds I watch (and when I do that)?

                  Yes… because it’s possible to use “normal” images to track who’s downloading those images, what addresses/user agent/referrers over time is powerful. After enough time, it’s entirely possible to deduce which feeds/communities you’re watching. Eg, if I post 10 different items, and 3 of them come back to your specific IP address, I would have a really good estimate on which feeds you’re likely on. Do this at scale and I bet you could deduce it completely and probably with much less time and hassle than you’re thinking. Hell because of my reverse proxy I can see EVERYONE who loads my profile picture. I see ALL the users to run into my posts on complete fucking accident. Lemmy loads /inbox to pull that data.

                  Hell this is the core reason why everyone pushes back on 3rd party cookies these days. It made this tracking trivial. Tagging every page with some image or asset that forces a connection is effectively the same thing.

                  Who I interact with through DMs?

                  I’ve already stated clearly that this would be the hardest thing. Just because there’s one or 2 things that would be hard or impossible to obtain (even over time) passively or as a complete outsider doesn’t make the rest of the argument wrong. All it would take is either site operator to leak the data, any type of MITM, etc… to leak the plaintext content of your DMs. Hell federation leaks where it sends data outside of the expected subscribers has happened. Then you have to also realize that many instances use services like Cloudflare or other WAF solutions to stop DDOS’s and such… Those nodes can read the plaintext DMs and all federation data. Any malicious actor that manages to break any single part of the chain has access to it all… and it can be quite trivial in many instances to do so.

                  The Lemmy system is not “secure”. It’s not meant to be. Everything on the fediverse is public and all of your actions here are trackable by many parties in many ways even outside of the operators of both ends of the federation action itself. Including how you’re connecting and using the system.

                  DMs alone, and actual hashed passwords are not really needed for a third party threat to act malicious and get all of the aggregated data they’d ever want. You pointed out specifics, I answered those specifics. Then you pivoted to other shit that I ALREADY outlined. This argument is super disingenuous.

      • @[email protected]
        link
        fedilink
        11 month ago

        In a recent Lemmy version they added support for proxying images. So for people worried about this, see if you can find an instance (or set up your own) that does image proxying.

        Before you ask, I’m not aware of any but I’m sure there are some.

        • Saik0
          link
          fedilink
          English
          11 month ago

          Yeah that was 19.4. It’s doesn’t proxy everything unless explicitly set to. Just thumbnails I believe. But I could be wrong. And many instance owners would be allergic to that as it leaves them on the hook for storing content. For example… someone posts CSAM… a copy of that is now on your server. You get police raided and you’re fucked.

          https://github.com/LemmyNet/lemmy/blob/705e86eb4c0079d0775f0c1490968f1183095fcc/config/defaults.hjson#L51

          Actually going over it briefly looks like it has a few available options for what it will cache…

          I refuse to enable it myself for the above reason. I would venture 99% of instances out there would also refuse for liability and bandwidth costs.

          • @[email protected]
            link
            fedilink
            11 month ago

            Certain (but not all) thumbnails have been sort of proxied for a while, but it’s complicated. But for example if someone posts a link to some questionable content on imgur, your instance will have a copy of that cached (and never delete it, because… Lemmy reasons). The recent changes just mean you can now enable other images to be proxied, though this is disabled by default. This proxy has an age (a day or a week or whatever you set) and content is deleted if it hasn’t been accessed in that timeframe - this is in contract to the normal Lemmy image stuff that I believe still stays forever unless that was fixed recenty.

            And many instance owners would be allergic to that as it leaves them on the hook for storing content

            This is already a risk whether via the existing thumbnail storage or via user uploads. It’s a pretty common recommendation that you should never host a website like Lemmy on a home server, always use a VPS for this reason. Then make sure you understand your local laws as well.

            • Saik0
              link
              fedilink
              English
              11 month ago

              This is already a risk whether via the existing thumbnail storage

              Not anymore. You can opt out of it for the most part.

              # Leave images unchanged, don’t generate any local thumbnails for post urls. Instead the the
              # Opengraph image is directly returned as thumbnail
              “None”

              • @[email protected]
                link
                fedilink
                1
                edit-2
                1 month ago

                Oh I didn’t realise this! I’ll have to investigate more. Even if you want proxying, it makes way more sense to use the proxy image functionality that actually deletes the images after a period of time.

                Thanks for bringing this to my attention, I’m quite excited about it 😆

                Edit: seems like it’s been an option since 0.19.0!