The developers of the Manjaro Linux distribution, built on the basis of Arch Linux and aimed at beginners, announced the beginning of testing a new service MDD (Manjaro Data Donor), designed to collect statistics about the system and send it to the external server of the project. The author of the MDD intended to enable telemetry by default (opt-out), but the decision has not yet been approved and, judging by the objections of some developers and users, it is likely that telemetry will be offered as an option requiring prior consent of the user (a request to enable telemetry is proposed to be added to the greeting interface after the first download).

The report includes data such as host name, kernel version, desktop component versions, detailed information about hardware and drivers involved, screen size and resolution information, network device MAC addresses, disk serial numbers, disk partition data, information about the number of running processes and installed packages, versions of basic packages such as systemd, gcc, bash and PipeWire.

The sent data is stored on the project server in the ClickHouse database and visualized using the Grafana platform. The IP addresses of users are not stored, and the hash from the /etc/machine-id file is used as the system identifier.

Аccording to the code https://github.com/manjaro/mdd/blob/master/mdd.py#L40 sends everything.

    • @seaQueue
      link
      252 months ago

      Whatever they can get their hands on, including your unique hardware identifiers

    • @[email protected]
      link
      fedilink
      English
      11
      edit-2
      2 months ago

      Ad firm money.

      Maybe I’m just cynical, but my first instinct when I see stuff like this is they have a secret contract with an advertiser and are selling this information.

  • @[email protected]
    link
    fedilink
    64
    edit-2
    2 months ago

    enable telemetry by default … MAC addresses, disk serial numbers

    Another reason to not use Manjaro. Just use Endeavour instead.

    Edit: I’m not against telemetry pre se. I have the KDE feedback enabled for example but that was opt in and sends no unique data.

    • @rtxn
      link
      English
      292 months ago

      It’s all about trust. Manjaro has given me reasons to distrust them.

      • exu
        link
        fedilink
        English
        5
        edit-2
        2 months ago

        When?

        Edit: I misread, though it said “trust” instead of “distrust”

        • @rtxn
          link
          English
          192 months ago

          They’ve let TLS certs expire on multiple occasions. They’ve made the decision to enable the AUR in the default installation, which can cause conflicts with out-of-date dependencies because of the delayed release schedule compared to Arch. They’ve shipped software on their stable branch that included unmerged upstream code. One of their developers temporarily broke Asahi Linux.

          I don’t hate the project, but I can’t trust the developers and management.

          • @[email protected]
            link
            fedilink
            102 months ago

            They’ve let TLS certs expire on multiple occasions.

            And they told their community to set their clocks back. As a workaround, it will work but all your created and modified data will have the wrong timestamps.

            • @rtxn
              link
              English
              5
              edit-2
              2 months ago

              He’s also a contributor to Asahi Linux. One of his MRs changed the build options that somehow caused it to (IIRC) use mainline Mesa instead of the branch that is specifically modified to work on ARM.

              (edit) Aussie linux man: https://www.youtube.com/watch?v=eDRiBbzzREw

              It’s not only his fault, but mostly.

          • @seaQueue
            link
            42 months ago

            They’ve done it more than once now

          • Norah - She/They
            link
            fedilink
            English
            12 months ago

            wait, is that name “manjarno” like when brad pitt says bonjorno in inglorious basterds??

    • @auzy
      link
      -2
      edit-2
      2 months ago

      deleted by creator

      • @[email protected]
        link
        fedilink
        102 months ago

        Why?

        Let me put the question back to you. How do think the uniquely identifiable information will help them improve Manjaro?

        Do you think they’ve got a Russian satellite and will track down your HDD serial number from space?

        No.

        There’s lots of benefits to telemetry.

        As I basically said, if you bothered to read my comment.

        • @auzy
          link
          1
          edit-2
          2 months ago

          deleted by creator

  • @[email protected]
    link
    fedilink
    512 months ago

    network device MAC addresses, disk serial numbers

    That’s enough. I’m calling it evil from now on.

    • Bezier
      link
      fedilink
      222 months ago

      Thought it’s probably fine after reading the title, but this shit isn’t fine. What the fuck.

    • @Buffalox
      link
      -32 months ago

      The MAC address is anonymized with sha256, and IP adresses aren’t stored.
      So this seems to me to be perfectly anonymous.

      • @[email protected]
        link
        fedilink
        212 months ago

        Why collect such data though? And you can call some Big Tech telemetry completely anonymous too if you trust their explanations.

        • @Buffalox
          link
          32 months ago

          You can see the code of what is send.
          I’m not aware that Google claims they collect data anonymously, on everything where you are logged in.
          So that’s a false equivalence.

          • @[email protected]
            link
            fedilink
            02 months ago

            I’m not aware that Google claims they collect data anonymously, on everything where you are logged in.

            I meant other companies but ok.

      • @[email protected]
        link
        fedilink
        16
        edit-2
        2 months ago

        MAC addresses are 48 bit, and half of that is just the manufacturer. So 24 bits really, and those bits aren’t random, I think manufacturers just assign these based on some scheme, like a serial number. Point is you could easily reverse the SHA by brute force.

        You can’t calculate any useful statistic from a hash so literally the only use this would have is some sort of tracking.


        Edit: I just looked up some data and I found someone using hashcat on an RTX 3090, which looks like it can do almost 10000 million SHA256 hashes per second of salted passwords (which are longer than 48 bit MACs, so MACs should be faster). 2²⁴ is 16.8 million, so it’ll take about 1.7 ms per vendor. I found a database with (all?) 53011 vendor ids:

        >>> 2**24 * 53011 / 10000 / 1000 / 1000
        88.93769973759998
        

        Yup, 89 seconds. You can calculate the SHA256 of every single MAC ever potentially issued in 89 seconds on a bog-standard 3090.

        • @Buffalox
          link
          22 months ago

          this would have is some sort of tracking.

          It’s right at the top of the announcement, that it’s mainly for more accurate stats on unique users.
          It’s not that I think this is a good idea, because I don’t, but some people are blowing it out of proportions. Especially since this isn’t at all decided. Which I seriously doubt it will.

          • @[email protected]
            link
            fedilink
            10
            edit-2
            2 months ago

            You don’t need this to count unique users. You could just assign a random number on install or whatever. Or even more simply, just run the thing once per month, should be accurate enough. Do they expect the software to just randomly spam duplicate reports? Don’t write it that way.

            Best case they don’t care about collecting minimal data and don’t understand that hashed MACs are easily reversible. So incompetent fools with no sensitivity to privacy.

            Maybe this should be Manjaro’s tagline: Not purposely malicious, just grossly negligent and ignorant.

            • @Buffalox
              link
              52 months ago

              You could just assign a random number on install or whatever.

              Funny, I thought the exact same thing.

  • LiveLM
    link
    fedilink
    English
    44
    edit-2
    2 months ago

    Opt-out? I see it’s time for the seasonal Manjaro fuck up.

    • @seaQueue
      link
      162 months ago

      They’ll find some way to make this change break the AUR again

  • SavvyWolf
    link
    fedilink
    English
    352 months ago

    Why do they need information about the hostname? Is it really valuable for them to know how many systems are named daves-pc?

  • @[email protected]
    link
    fedilink
    282 months ago

    The report includes data such as host name, kernel version, desktop component versions, detailed information about hardware and drivers involved, screen size and resolution information, network device MAC addresses, disk serial numbers, disk partition data, information about the number of running processes and installed packages, versions of basic packages such as systemd, gcc, bash and PipeWire.

    That’s insane

  • @[email protected]
    link
    fedilink
    262 months ago

    I get the usefulness of technical telemetry such as kernel version, RAM, disk space, processor type, etc… but NIC MAC? HDD serial? WTF?

    • @[email protected]
      link
      fedilink
      English
      122 months ago

      Those are absolutely ways of covertly identifying your device while technically not counting as “personal information” under privacy laws.

        • @[email protected]
          link
          fedilink
          English
          62 months ago

          The point is that it’s a loophole in privacy laws so they don’t have to outright tell people that they collect personal or identifying information. So they can legally mislead people by claiming it’s anonymous telemetry in hopes that users don’t actually look into it or understand the implications.

      • r00ty
        link
        fedilink
        42 months ago

        I said elsewhere, I hope this is just some way to track changes over time per user.

        But they need to take an anonymous hash of some non changing data or create an install id that is used for this and nothing else (e.g it identifies a unique user but not the person or hardware behind the user).

        Too much identifying info is just pushed around like we shouldn’t care, it’s become a real problem.

  • @[email protected]
    link
    fedilink
    222 months ago

    data such as host name,

    Okay why do they need to know that? Why do they need to know if the computer is called “Melissa’s Laptop” or “Workstation 15, Internal security division”? Seems like this kind of data could if stolen be misused and it has minimal legitimate purpose IMO as anyone can put anything as host name and while in organizations it often corresponds to use it doesn’t have to for individuals. Someone could call their machine “Mack’s Porn Rig” and they only use it for doing banking and a little coding.

    kernel version, desktop component versions, detailed information about hardware and drivers involved, screen size and resolution information,

    This all seems legitimate enough, this would be helpful for understanding the hardware their users run on and targeting features or bug fixes.

    network device MAC addresses,

    Not great but there is an argument for it, they could just grab and send the first 3-4 octets which would give them the info they need on manufacturers without getting uniquely identifiable data that along with some of this other stuff is concerning for fingerprinting.

    disk serial numbers,

    Okay, what the fuck. Why do they need disk serial numbers? What possible use is there for that. Those are used for warranty claims and could be used as part of uniquely fingerprinting a computer and person. Not cool.

    disk partition data,

    This is vague enough. I guess one could choose to see this as just info about partitions in use say if there’s also an NTFS partition that looks like a Windows install that would be useful but on the other hand data encompassed within a partition could also nefariously be read as allowing them access to all your data. Partition layout, partition labels, and file systems used on disks available to the system would be a clearer way to put this and erase any doubt.

    information about the number of running processes and installed packages, versions of basic packages such as systemd, gcc, bash and PipeWire.

    All this is also fine just technical data stuff.

    • @seaQueue
      link
      122 months ago

      Friends don’t let friends use Manjaro

  • Destide
    link
    fedilink
    English
    162 months ago

    It amazes me it’s still as popular as it is and still own goaling at least once a year.

  • @[email protected]
    link
    fedilink
    English
    152 months ago

    I’ve defended Manjaro many a time, despite the mistakes they’ve made. The main reason for this, Manjaro is the most stable Linux distro I’ve used.

    However, the main reason I ditched Windows as my primary OS was telemetry (and bloat). If Manjaro introduce this, it absolutely must be opt-in.

    I actually contribute to the Steam hardware survey as I want to ensure Valve, but more so hardware manufacturers, are aware desktop Linux systems for gaming and creative work are viable. But it’s my choice to contribute.

    If Manjaro don’t implement this as an opt-in then I’ll be installing Arch. It will be a pain to configure my software again but needs must.

      • @[email protected]
        link
        fedilink
        English
        32 months ago

        I mostly used Ubuntu based desktop distros and frequently had issues with the 6 monthly update cycle. Problems with Fedora too. I have not had a single update issue with Manjaro. I often have different distros running in VM’s and whilst Arch has been the most reliable, most are not.

        I also setup loads of Linux servers in my I.T. job that I used to have, so I have plenty experience.

        The bottom line is Manjaro desktop has been ridiculously reliable for me. Therefore other peoples hate of it washes over me and is meaningless.

        • @[email protected]
          link
          fedilink
          22 months ago

          Yeah, besides some Nvidia driver problems, Manjaro was stable for me as well

          Have chosen it, because it was fast to setup and the base configuration wasn’t too of far off my liking

          But, by now I’m considering to switch

      • @steeznson
        link
        3
        edit-2
        2 months ago

        Yeah the Manjaro devs have a long history of gaffes not to mention the infamous one with PGP keys requiring users to reset their system clock

  • @[email protected]
    link
    fedilink
    English
    142 months ago

    Manjaro is already less stable than arch, now it collects your data involuntarily? Fucking wild how anyone can use it.

  • @Buffalox
    link
    13
    edit-2
    2 months ago

    This may be illegal in EU if they don’t use opt in. Even then it may be illegal for under 18 year olds to collect MAC addresses and disk serial numbers, as those can potentially be used for identification.

    The data is anonymized, and the IP is NOT stored. So I’m not sure this violates GDPR?

    From the code we can see the machine ID is anonymized, sending only a SHA256 checksum.

    def get_hashed_device_id():
        # Read the machine ID
        with open("/etc/machine-id", "r") as f:
            machine_id = f.read().strip()
    
        # Hash the machine ID using SHA-256 to anonymize it
        hashed_id = hashlib.sha256(machine_id.encode()).digest()
    
        # Convert the first 16 bytes of the hash to a UUID (version 5 UUID format)
        return str(uuid.UUID(bytes=hashed_id[:16], version=5))
    
    

    This makes it somewhat a nothingburger IMO.

    • @[email protected]
      link
      fedilink
      10
      edit-2
      2 months ago

      That’s not anonymous, that’s pseudonymous.

      What is the point of this? The machine-id already looks to be some unique random number, so you’re calculating another unique random-looking number from that, might as well use the original number.

      You can’t glean any useful information from a unique random-looking number that would help with developing Manjaro. You can’t calculate any statistics from that. The only use is tracking.

      Edit: And as mentioned in my other comment, reversing the MAC SHA by brute force is trivial, so that one at least (and possibly the other hardware serial numbers they collect) shouldn’t even be considered pseudonymous.

    • @ouch
      link
      42 months ago

      Nah, it’s still considered Personal Data under GDPR, because it’s possible to connect to natural persons. So GDPR applies. And this is illegal, there is no legal basis for processing this data.

      • @Buffalox
        link
        1
        edit-2
        2 months ago

        because it’s possible to connect to natural persons.

        That’s debatable, and is only based on the claim that it’s just a 24bit decoding that can be brute forced. I don’t know for a fact that it’s true that it can be boiled down to 24bit.
        I checked my own /etc/machine-id, and the folder doesn’t even exist, so what exactly is supposed to be in it IDK. And yes I use Manjaro.

        • @[email protected]
          link
          fedilink
          42 months ago

          I edited my comment on your other reply and by my estimation, calculating every SHA256 of all MACs ever potentially issued takes less than 89 seconds on an RTX 3090.

          I also think MACs are (or should be considered) personally identifiable information, since there is potentially a paper trail back to the person who bought it. Plus MACs are not secret information, it’s broadcast on the LAN and for wireless modules over the air in the immediate vicinity (though some systems will randomize wireless MACs for privacy reasons). Privacy-unfriendly software has been known to collect MACs (even from other devices on the network and in the vicinity), so there are already databases connecting MAC addresses with other data.

          • @Buffalox
            link
            12 months ago

            calculating every SHA256 of all MACs

            Yes but because I don’t have the folder it reads myself, I can’t see what actually encoded. Are you sure /etc/machine-id is ONLY the MAC address?