Why would a UTF-8 MySQL backup contain invalid UTF-8 characters?

Cousin Mose · 1 month ago

Why would a UTF-8 MySQL backup contain invalid UTF-8 characters?

@modeler · 1 month ago

This is the right answer. I had the job of planning a schema update to fix this shitty design.

Saying that, unicode and character formats are incredibly complex things that are not easily implemented. For example two strings in utf-8 can contain the same number of characters but be hugely different in size (up to 3-4x different!). It’s well worth reading through some articles to get a feel of the important points.