Hello, I’ve tried to find someone else using OpenBSD in various places for a while now, but with no success, so I’m hoping someone will read this.

I’m wondering what your output is from file(1) on a file you know has text encoded as UTF-8.

On my system (7.3-stable) the output is “Non-ISO extended-ASCII text”, and I’m trying to figure out if this is how it should be, or if I did something wrong setting up the system.

So, if you have a computer with OpenBSD and a minute to spare, could you try running file(1) on a UTF-8 file and see if it identifies it as UTF-8 or “Non-ISO extended-ASCII text”?

Thanks in advance

  • @Rand0mA
    link
    21 year ago

    I don’t have OpenBSD, but to check if a file is UTF-8, try this:

    file -i filename.txt

    The command should tell you the charset information, and if it’s UTF-8, it should say something like “charset=utf-8.”

    The file command might still label UTF-8 files as ASCII text due to its classification rules (UTF-8 is an extension of the ASCII character set).

    That result doesn’t necessarily mean there’s something wrong with your system setup.

    • @[email protected]OP
      link
      fedilink
      2
      edit-2
      1 year ago

      If I run file(1) on a file containing only characters in the ASCII set, the output is “ASCII text”. So far so good. If I add an “å”, the output of file(1) is “ISO-8859 text”. This is not correct, since if I look closer at what’s there, the “å” is encoded as \xc3\xa5, and this same file is reported to be UTF-8 in Debian and other OSs. If I add more unicode like “· ß ð ŋ” to the file, then file(1) says it is “Non-ISO extended-ASCII text” on OpenBSD. file -i testfile gives “text/plain”. Something is not right here.

      edit: the file does not contain a BOM, but that is discouraged in UTF-8 files anyway. I have tried manually adding the correct BOM and it didn’t help.

      • @Rand0mA
        link
        21 year ago

        Make sure your test file contains a decent amount of UTF-8 text, not just a few characters. The file command uses statistical analysis, so having more text might help it make a more accurate determination.

        What does the locale command return?? … to set your locale you can use the export command (eg. export LC_CTYPE=“en_US.UTF-8” using whatever code is relevant)