Output of file(1)

@[email protected] · 1 year ago

Output of file(1)

@[email protected] · edit-2 1 year ago

If I run file(1) on a file containing only characters in the ASCII set, the output is “ASCII text”. So far so good. If I add an “å”, the output of file(1) is “ISO-8859 text”. This is not correct, since if I look closer at what’s there, the “å” is encoded as \xc3\xa5, and this same file is reported to be UTF-8 in Debian and other OSs. If I add more unicode like “· ß ð ŋ” to the file, then file(1) says it is “Non-ISO extended-ASCII text” on OpenBSD. file -i testfile gives “text/plain”. Something is not right here.

edit: the file does not contain a BOM, but that is discouraged in UTF-8 files anyway. I have tried manually adding the correct BOM and it didn’t help.

@Rand0mA · 1 year ago

Make sure your test file contains a decent amount of UTF-8 text, not just a few characters. The file command uses statistical analysis, so having more text might help it make a more accurate determination.

What does the locale command return?? … to set your locale you can use the export command (eg. export LC_CTYPE=“en_US.UTF-8” using whatever code is relevant)

@[email protected] · 1 year ago

I have all of this page: https://www.w3.org/2001/06/utf-8-test/UTF-8-demo.html as a test file. It renders fine and displays all the languages and special characters in vim.

LC_CTYPE is “en_US.UTF-8” , I export it in .xsession (and in .profile).

XTERM_LOCALE is also “en_US.UTF-8”