If a recording of someones very rare voice is representable by mp4 or whatever, could monkeys typing out code randomly exactly reproduce their exact timbre+tone+overall sound?

I don’t get how we can get rocks to think + exactly transcribe reality in the ways they do!

Edit: I don’t get how audio can be fossilized/reified into plaintext

  • @Nibodhika
    link
    36 months ago

    I call bullshit on that. Every second there are 44100 samples of 8 bit, so every second of sound is 44100 bytes, or 44kB. Even 1 second of audio is impossible to generate all possibilities.

    To put this in perspective, there’s something called Universally Unique Identifier (UUID for short), one of them is 128 bits, or 16 bytes. Let’s imagine these were 1 bit long, on the second attempt at generating an id you would have a 50% chance of generating a repeated one, which means that by the third one you generate the chances that you have already generated a repeated id are 50%; If we extend this to 1 byte (i.e. 256 possibilities) the second time you have 1/256 chance of generating a repeated one, the second time 1/255, so on, and so forth. So from the third one on your chances of having already generated a duplicated id are 1/256 + 1/255 + 1/254 + … This means that by the 103th id you generate you have a 50% chance to have already generated a repeated one; why did I do those examples? Because a UUID has 16 bytes, this means that if you generated a billion UUID per second, it would take you 100 years to have a 50% chance of having generated a repeated one, and by that time you would need 43 ZB of storage (that’s not a typo, it’s Zettabytes as in 1024 EB (which is also not a typo, that’s Exabytes which is 1024 PB (which is also not a typo, that’s Petabytes which is 1024 TB, or Terabytes which is the first measure people are likely to be familiar with))).

    Let me again try to put this in perspective, if Google, Amazon, Microsoft and Facebook emptied all of their storage just for this, they wouldhave around 2 Exabytes, so you would need a company 4300x larger than that conclomerate to have enough space to store the amount of unique ids that would be generated from a 16 byte random data (until you have a 50% chance of generating a repeated one).

    Another way of thinking about this is that to store all of the possible combinations of 1 bit you need 2 bits of space, for 2 bits is 4, for 3 bits is 8, it goes on exponentially, so that for n bits is 2^n. For the UUID that is 3.4E38, or 3.5E13 YB (again, not a typo, that’s 1024 Zettabytes), i.e 35000000000000 YB (I could go up a few more orders of magnitude, but I think I made my point). And this is for 128 bits, every bit doubles that amount.

    So again, I call bullshit that they have all possible sounds for even 1 second which is almost 3x that amount.

    • @maengooen
      link
      76 months ago

      I appreciate the interest in doing all the math, and I am also not specifically familiar with audio or the audio library, but I believe you could use a similar argument against the OG library of babel, and I happen to know(confidently believe?) that they don’t actually have a stored copy of every individual text file “in the library”, rather each page is algorithmically generated and they have proven that the algorithm will generate every possible text.

      I’d wager it’s the same thing here, they have just written the code to generate a random audio file from a unique input, and proven that for all possible audio files (within some defined constraints, like exactly 15 seconds long), there exists an input to the algorithm which will produce said audio file.

      Determining whether or not an algorithm with infrastructure backing it counts as a library is an exercise left to the reader, I suppose.

      • @Nibodhika
        link
        06 months ago

        The claim was it “contains every 15 seconds audio recording you can imagine. Every single one.”. Which is bullshit, that’s like saying this program contains every single literally work:

        import sys
        
        print(sys.argv[1])
        

        It’s just adding a layer of encoding on top so it feels less bullshity, something like:

        def decode(number: int):
          out = ""
          while number:
            number, letter_index = divmod(number, len(string.printable))
            out += string.printable[letter_index]
          return out
        

        That also does not contain every possible (ASCII) book, it can decode any number into a text, and some numbers happen to contain texts that are readable.