I’ve been running into several problems with restoring MySQL backups. Namely, the backups come from an environment other than the one I’m working in and I’m forced to remove superuser commands contained in the backups.
The problem is when trying to remove those commands I’m constantly getting UTF-8 encoding errors because there are loads of invalid character sequences.
Why would MySQL encode a backup as UTF-8 if the data isn’t actually UTF-8? This feels like bad design to me.
It could be a lot of things! For example:
A PDF or other binary file stored in a text field might get misinterpreted as non-UTF-8 characters during a backup.
Similarly, audio or video files—or any kind of binary data—stored inappropriately in text fields could cause issues.
It could also be due to corrupt data or improper encoding when the data was inserted into the database.
Essentially, anything non-textual or incorrectly encoded could result in invalid UTF-8 characters showing up in a backup.