• @Batman
    link
    603 months ago

    A word document is xml

    • @renzevOP
      link
      English
      65
      edit-2
      3 months ago

      zipped xml!

      • clb92
        link
        fedilink
        English
        293 months ago

        Lots or file formats are just zipped XML.

        I was reverse engineering fucking around with the LBX file format for our Brother label printer’s software at work, because I wanted to generate labels programmatically, and they’re zipped XML too. Terrible format, LBX, really annoying to work with. The parser in Brother P-Touch Editor is really picky too. A string is 1 character longer or shorter than the length you defined in an attribute earlier in the XML? “I’ve never seen this file format in my life,” says P-Touch Editor.

        • @SzethFriendOfNimi
          link
          11
          edit-2
          3 months ago

          Sounds like it’s actually using XSLT or some kind of content validation. Which to be honest sounds like a good practice.

          • clb92
            link
            fedilink
            English
            9
            edit-2
            3 months ago

            Here’s an example of a text object taken from the XML, if you’re curious: https://clips.clb92.xyz/2024-09-08_22-27-04_gfxTWDQt13RMnTIS.png

            EDIT: And with more complicated strings (like ones havingnumbers or symbols - just regular-ass ASCII symbols, mind you) there will be tens of <stringItem>, because apparently numbers and letters don’t even work the same. Even line breaks have their own <stringItem>. And if the number of these <stringItem> and their charLen don’t match what’s actually in pt:data, it won’t open the file.

            • @SzethFriendOfNimi
              link
              1
              edit-2
              3 months ago

              Is it because of the lower case Latin æ since it’s technically one character even if two bytes?

                • @SzethFriendOfNimi
                  link
                  13 months ago

                  What a mess… sounds like the devs got burned by various Unicode edge cases RTL, etc

        • @bitjunkie
          link
          13 months ago

          Do you have to define a length range?

      • @Batman
        link
        73 months ago

        The future if text documents were Json:

        City_pic.png.xml