(btw yes you can add new categories)
if you could standardise one file format for a task, what would it be:
- photos .jxl
- open domain image data .exr
- videos .av1
- lossless audio .flac
- lossy audio .opus
- subtitles srt/ass
- fonts .otf
- container mkv (doesnt contain .jxl)
- plain text utf-8 (many also say markup but disagree on the implementation)
- documents .odt
- archive files .tar.zst (this one is causing a bloodbath so i picked randomly)
- configuration files toml
- typesetting typst
- interchange format .ora
- models .gltf / .glb
- daw session files .dawproject
- otdr measurement results .xml
I don’t see a need, extensions are there for helping software more than helping people.
It’s actually the opposite. To my knowlegde, Windows is the only OS that I’ve used that uses the file extension to determine the contents, but then they hide it from the user. So maybe file extensions are only for windows?
How does osx know how to open a PDF not named .PDF?
The standard system in macOS is based on a Uniform Type Indicator, or UTI, like public.plain-text for a plain text file, and public.jpeg for a JPEG image.
To determine the file type, macOS uses MIME types when downloading from the Internet, can still use old Classic Mac OS four-character type codes, and ultimately relies on UTIs.
To get the UTI of a given file, use the mdls (meta data list, part of Spotlight) command in the Terminal.
Check out https://en.m.wikipedia.org/wiki/Uniform_Type_Identifier for more info.
PDFs have a MIME type of
application/pdf
per the spec, but you might still encounter some with MIME types likeapplication/x-pdf
. MacOS reads the MIME type of a file, then assigns thecom.adobe.pdf
UTI (if it wasn’t already assigned by another Mac application).Huh, TIL. Thanks!
i dknt understand
The Freedesktop Foundation has already categorized a ton of stuff and more.
could you link the index apologies
Here’s the repo: https://gitlab.freedesktop.org/xdg/shared-mime-info
Here’s the current XML for the metadata: https://gitlab.freedesktop.org/xdg/shared-mime-info/-/blob/b7db17480af0aeeb6df5668e8d10e275527d2825/data/freedesktop.org.xml.in
mate i am so confused sorry
Here’s an example of a PNG:
This says that PNG files…
- Have a mime type of image/png
- Have a full name of “PNG image”
- Have an acronym of “PNG”
- Have an expanded acronym of “Portable Network Graphics”
- From byte 0, all PNG files have “\x89PNG\r\n\x1A\n” in the file content
- Match a file glob pattern of “*.png” (case insensitive)
There are hundreds of entries like this in the XML with formal categorization :)
It sounds like you’re trying to approach a solved problem, though. Why are you building a file list, and what are you going to use it for? What are you ultimately trying to do?
im trying to find the ‘best’ format for each category