I’ve been running into several problems with restoring MySQL backups. Namely, the backups come from an environment other than the one I’m working in and I’m forced to remove superuser commands contained in the backups.
The problem is when trying to remove those commands I’m constantly getting UTF-8 encoding errors because there are loads of invalid character sequences.
Why would MySQL encode a backup as UTF-8 if the data isn’t actually UTF-8? This feels like bad design to me.
This is the right answer. I had the job of planning a schema update to fix this shitty design.
Saying that, unicode and character formats are incredibly complex things that are not easily implemented. For example two strings in utf-8 can contain the same number of characters but be hugely different in size (up to 3-4x different!). It’s well worth reading through some articles to get a feel of the important points.