To fuck with computers that don’t know how to do UTF8, add a few emoji.
I once set a WiFi ssid to 🌻 and I was amazed at how much problems that likely caused. I had people showing me their network manager was dumping random characters. Some other routers web interfaces became corrupted when trying to show the neighborhood. Some clients refused to connect. Even a bsod on a windows XP box.
I’m currently in a project where the client has a custom, but not entirely consistent or known subset of utf-8.
They want us to keep the form content as it is, but remove the “bad” characters. Our current approach is to just forward everything as it is and wait for someone to complain. How TF am I supposed to remove a character without changing the message?
Yeah I had a backend with poor support for anything that wasn’t ASCII. So my solution was turning everything into hex before storing it. I wonder if people are still using it.
Yeah I had a backend with poor support for anything that wasn’t ASCII
PHP is like this. Poor Unicode support, but it treats strings as raw bytes so it usually works well enough. It turns out a programming language can take data from a form, save it to a database, then later load and render it, without having to know what those bytes actually mean, as long as the app or browser knows it’s UTF-8, for example through a Content-Type header or meta tag.
The tricky thing is the all the standard string manipulation functions (strlen, substr, etc) don’t handle Unicode properly at all and they deal with number of bytes rather than number of characters. You need to use the “multibyte” (Unicode-ready) equivalents like mb_substr, but a lot of PHP developers forget to do this and end up with string truncation code that cuts UTF-8 characters in half (e.g.if it’s truncating a long title with Emoji in it, it might cut off the title in the middle of the three bytes that represent the Emoji and only leave 1 or 2 of them)
You just need to ensure you validate character by character (NOT byte by byte) and allow characters in the Emoji Unicode ranges (which are well-defined in the Unicode standard). Using a library is a great idea though.
Common mistake: When you’re ascribing a bad quality to them, “millenials” means everyone born after 1960. If you’re ascribing a good quality to them, it only means people born between December 12, 1989, and December 14, 1989.
I knew someone who did physics in cursive. It was impossible to read (not bc it was sloppy, because seeing Greek letters as cursive threw me for a loop)
Forced to learn it in elementary school because “highschool and college require it!” by Boomers that didn’t recognize the tech revolution only to get to college and be told by those same boomers to never turn in a handwritten paper unless you wanted an auto fail.
There is no specification for CSV, which is why it’s such a mess and different parsers and renderers have wildly different features. The closest thing to a spec is RFC4180 but that RFC simply describes the most common features across several CSV implementations, and is not actually a spec.
I agree that it should be comma separated though. My understanding is that it caused issues in countries that use a comma as a decimal point.
Also, Excel sometimes uses tabs rather than commas or semicolons.
Using comma would probably caused more problems as it is a decimal separator for those languages.
My excel also uses semicolon in formulas instead of comma when separating parameters. Some VBA scripts break when using different language settings and some forumilas don’t translate automatically to different locale so they just give an error. Overall using excel in different locale setups is annoying.
Best separator I have used is | as i have never seen it in the data as an input. Comma and semicolon both have caused issues in the past for me as they might pop up at wrong places.
deleted by creator
I once set a WiFi ssid to 🌻 and I was amazed at how much problems that likely caused. I had people showing me their network manager was dumping random characters. Some other routers web interfaces became corrupted when trying to show the neighborhood. Some clients refused to connect. Even a bsod on a windows XP box.
One of my projects was validation for form submission and emojis melted me. I gave up trying to do it from scratch and trusted a library.
I’m currently in a project where the client has a custom, but not entirely consistent or known subset of utf-8.
They want us to keep the form content as it is, but remove the “bad” characters. Our current approach is to just forward everything as it is and wait for someone to complain. How TF am I supposed to remove a character without changing the message?
Yeah I had a backend with poor support for anything that wasn’t ASCII. So my solution was turning everything into hex before storing it. I wonder if people are still using it.
PHP is like this. Poor Unicode support, but it treats strings as raw bytes so it usually works well enough. It turns out a programming language can take data from a form, save it to a database, then later load and render it, without having to know what those bytes actually mean, as long as the app or browser knows it’s UTF-8, for example through a Content-Type header or meta tag.
The tricky thing is the all the standard string manipulation functions (
strlen
,substr
, etc) don’t handle Unicode properly at all and they deal with number of bytes rather than number of characters. You need to use the “multibyte” (Unicode-ready) equivalents likemb_substr
, but a lot of PHP developers forget to do this and end up with string truncation code that cuts UTF-8 characters in half (e.g.if it’s truncating a long title with Emoji in it, it might cut off the title in the middle of the three bytes that represent the Emoji and only leave 1 or 2 of them)You just need to ensure you validate character by character (NOT byte by byte) and allow characters in the Emoji Unicode ranges (which are well-defined in the Unicode standard). Using a library is a great idea though.
Why not simply discard them?
They called it “The Sunflower Incident.”
deleted by creator
Is there a character limit? Can it be the binary for DOOM?
deleted by creator
I believe it’s 32 bytes, but it depends on the AP, some use a null terminator as the final byte.
64 characters long is wifi spec IIRC but some routers don’t follow spec, wouldnt go higher than 60. Idk if this helps answer your question.
I had a ton of trouble with an apostrophe in my SSID until I realized that was the cause.
I had the same issue. (Or rather, cause of issues.) Some devices couldn’t identify it.
Great success!
I had an emoji in my phone hotspot a while ago. Unfortunately I had to remove it after a while because some devices refused to connect.
How would this mess with millennials? I think you mean gen z.
Common mistake: When you’re ascribing a bad quality to them, “millenials” means everyone born after 1960. If you’re ascribing a good quality to them, it only means people born between December 12, 1989, and December 14, 1989.
Incidentally, @[email protected]’s birthday is 13th December.
They’re one of the good ones
We learned cursive.
Were told our assignments in high school would get an automatic zero if we didn’t turn them in in cursive, even…
I knew someone who did physics in cursive. It was impossible to read (not bc it was sloppy, because seeing Greek letters as cursive threw me for a loop)
Yeah! Most of us can read analog clocks too!
I actually work in an after school program and I’ve been teaching kids how to read analog clocks. It is interesting to say the least
Even my gen alpha kid was learning cursive in third grade last year. I don’t expect him to write using it much but at least he knows how to read it.
Apparently that’s not very common anymore.
The only thing I write in cursive these days is my signature.
Most of the time I don’t even write, I type or use swipe-to-text.
I journal in cursive since it’s faster and more natural, otherwise I use print.
𝔒𝔯 𝔶𝔢𝔬𝔩𝔡 𝔢𝔫𝔤𝔩𝔦𝔰𝔥 𝔱𝔬 𝔰𝔠𝔯𝔢𝔴 𝔴𝔦𝔱𝔥 𝔢𝔳𝔢𝔯𝔶𝔬𝔫𝔢.
Sir, this is a Wendy’s account.
Since it’s noon, that’ll be $92. TYVM and fuck you.
Dave2
All I see is *****
Hey, millennials know cursive!
Forced to learn it in elementary school because “highschool and college require it!” by Boomers that didn’t recognize the tech revolution only to get to college and be told by those same boomers to never turn in a handwritten paper unless you wanted an auto fail.
Your elementary school teachers were also your college professors?
There is only one Boomer. It’s like an Agent Smith situation.
deleted by creator
CSVs are supposed be comma-separated files. Microsoft deviated from the specification and decided some languages would use semicolons for CSVs.
Source: StackOverflow
cemicolon separated values
There is no specification for CSV, which is why it’s such a mess and different parsers and renderers have wildly different features. The closest thing to a spec is RFC4180 but that RFC simply describes the most common features across several CSV implementations, and is not actually a spec.
I agree that it should be comma separated though. My understanding is that it caused issues in countries that use a comma as a decimal point.
Also, Excel sometimes uses tabs rather than commas or semicolons.
Using comma would probably caused more problems as it is a decimal separator for those languages. My excel also uses semicolon in formulas instead of comma when separating parameters. Some VBA scripts break when using different language settings and some forumilas don’t translate automatically to different locale so they just give an error. Overall using excel in different locale setups is annoying.
Best separator I have used is | as i have never seen it in the data as an input. Comma and semicolon both have caused issues in the past for me as they might pop up at wrong places.
Here’s my confusion: as soon as it is no longer separated by commas, it is by definition no longer a CSV. Is it an SCSV now?
It turns into a CSV where the C stands for character.
Z̵̫̖͚̳̖̖̰̩̀̆͐͒͝ä̸̛̻́̈́̌͂̽̈́l̷̤̥̖̝͙̅g̵̱̤͙͕̥̮͌̽o̸̡̦̙̬̘͎̪̥̔ ̴͔̙̞̱̗͒͊͊̽̀̑͌ẏ̵̛̻̾o̸̡͍̤͔͌ų̶̠͔̯̲̖͇̯̅̒̓̃̏̓͊r̷͎̪̗̤̄̊̃̚͝ ̵̢̰͔̀t̵̡̘̤̙͕͎̅͂͛̀̚ȩ̷͙̙̖̲̟͍̉̎͝x̷͇̦̝̼͗͋̊t̶̫̹̳̩͇̼̠͚̿͆̅̋̔̃͐͗!̶̧̛͕̮̻̞͎͇̹͆͛͘̕̚͠
Even better, add some byte sequences that are invalid UTF-8.
Spot the windows user….
deleted by creator