I’ve noticed some files I opened in a text editor have all kinds of crazy unrenderable chars

  • Admiral Patrick
    link
    fedilink
    English
    8
    edit-2
    4 months ago

    If you’re on Linux, you can convert that to something more human readable by piping it to base64. It works with any file, but I’ll use an image here:

    cat image.webp | base64

    Which yields:

    UklGRroEAABXRUJQVlA4WAoAAAAgAAAAYwAAQgAASUNDUKACAAAAAAKgbGNtcwRAAABtbnRyUkdC
    IFhZWiAH6AAIABoADgAJACBhY3NwQVBQTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA9tYAAQAA
    AADTLWxjbXMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA1k
    ZXNjAAABIAAAAEBjcHJ0AAABYAAAADZ3dHB0AAABmAAAABRjaGFkAAABrAAAACxyWFlaAAAB2AAA
    ABRiWFlaAAAB7AAAABRnWFlaAAACAAAAABRyVFJDAAACFAAAACBnVFJDAAACFAAAACBiVFJDAAAC
    FAAAACBjaHJtAAACNAAAACRkbW5kAAACWAAAACRkbWRkAAACfAAAACRtbHVjAAAAAAAAAAEAAAAM
    ZW5VUwAAACQAAAAcAEcASQBNAFAAIABiAHUAaQBsAHQALQBpAG4AIABzAFIARwBCbWx1YwAAAAAA
    AAABAAAADGVuVVMAAAAaAAAAHABQAHUAYgBsAGkAYwAgAEQAbwBtAGEAaQBuAABYWVogAAAAAAAA
    9tYAAQAAAADTLXNmMzIAAAAAAAEMQgAABd7///MlAAAHkwAA/ZD///uh///9ogAAA9wAAMBuWFla
    IAAAAAAAAG+gAAA49QAAA5BYWVogAAAAAAAAJJ8AAA+EAAC2xFhZWiAAAAAAAABilwAAt4cAABjZ
    cGFyYQAAAAAAAwAAAAJmZgAA8qcAAA1ZAAAT0AAACltjaHJtAAAAAAADAAAAAKPXAABUfAAATM0A
    AJmaAAAmZwAAD1xtbHVjAAAAAAAAAAEAAAAMZW5VUwAAAAgAAAAcAEcASQBNAFBtbHVjAAAAAAAA
    AAEAAAAMZW5VUwAAAAgAAAAcAHMAUgBHAEJWUDgg9AEAALAQAJ0BKmQAQwA+8WSmTqmlKCYvmWqp
    MB4JZQDLnNaF2NMD2L3xQGb5nmLiGhGWxQuD8kwUSXF0u2UTgX0YrR3MY2SsRCNEQ8hZ6WkCUTih
    LdmsElHZVzoMwO/fj4X/ZSNT2R9qgxwqgEed891j4KCNRLK/tUbG3hZ3Mw2kixguSFIEcAgBtv8w
    eAu0PwAA/upMzBqq+dcN8viO7FpqpV6GvPcRILm+HsOQblnpHx03lASjGlSyGbkKUD3xA5KOqgq/
    VEUJ4qF9VoAYFbFhQRAgkvmREk5umMj8sr9Np95+n/oP2Aq2VW5xU4F1xpD8Vd4Dp7Phwm9w/Dnf
    94djRROFRYPZeg/1Q/qiROFRVRu2nBcgndbhc0x0h+kgvT/naeJOEqwNjYPlIiw/DGuxav7+x09R
    mf2mJto3ineDqfyMWUN83PmKqzGHkYGhZrTU478qjlQucDzWkwobnUmzhE6I+mDYkfiUVPcHyXbf
    xXRStyPiPZAkJZrE9OrjFNUeljRQdVTQqeBsy+O9VwDLU5GcKhBQHa4cj+/DGqUhi74WH0EuHsb3
    EgZVNc1FbRm5QFOpjDSprGIRYxe6sFFDrDOg4DhWZRnOa7s68pGaDDpbqrORxzPHXPbs55/1HTas
    DDGzKFmTG4hJ2GUZKqjPcQ+MAAAA
    

    Copy that into a text file and pass it to base64 with the decode flag, and you’ll get the original binary:

    cat data.txt | base64 -d > data.bin

    Inspect it to see what kind of file it is:

    file data.bin -> data.bin: RIFF (little-endian) data, Web/P image

    Rename it so you can just double-click it to open it:

    mv data.bin data.webp

    Enjoy the surprise.

    You can also print files like that, scan them using OCR, and then restore them. A very inefficient way to do backups, but it works.

    • @cheese_greaterOP
      link
      34 months ago

      How is it representing it tho? Like does it have woven in there an array of hexcode colors for every microscopic pixel that makea up the picture.

      Are images and audio files just arrays of frames which are arrays of pixels and sound units?

      • Admiral Patrick
        link
        fedilink
        English
        4
        edit-2
        4 months ago

        It just converts the raw binary data into character encoding, so it doesn’t matter what the source is (image, video, database file, etc). The source binary data is taken 6 bits at a time, then this group of 6 bits is mapped to one of 64 unique characters.

        The decoding process is just the reverse of that: mapping the data back to binary form.

        https://en.wikipedia.org/wiki/Base64

      • @Num10ck
        link
        English
        24 months ago

        the answer to your how question is as needed.

        some image and audio formats (especially older ones) are like that, yes. others use compression or other techniques to suit their need. like a sound can be a raw recording sample. or a sound can be described with Attack/Decay/Sustain/Release, along with octave and note etc. so a MIDI file is an audio file format without samples.

        i once created an image format to be used for spiraling out images, instead of pixel arrays they were concentric circles of pixels that i could easily offset.