Search code examples
mysqlcharacter-encodinglatin1georgian

converting "????" in mysql to something readable?


We've got a situation where several months of data that was collected via ODK Aggregate into a MySQL database is unreadable.

Data is Georgian characters, but was sent to a database with a latin1 character set/collation.

Data managers didn't catch this issue until a few days ago, and I was never made aware that they were using surveys with these characters... so now the questions obviously are 1) can we recover existing data? and 2) how to make sure future data is readable?

I can do a SELECT HEX(column) FROM table

and get the hex output, but here's what that looks like:

3F3F3F3F203F3F3F3F3F3F3F3F203F3F3F3F3F3F3F3F3F
3F3F3F3F3F3F3F
E18397E18391E18398E1839AE18398E183A1E18398

As you can see the last line looks right, but the others are not. When I created a test table with latin1 and tried to insert Georgian characters, I got Warning: #1366 Incorrect string value: '\xE1\x83\x93\xE1\x83\x93...' for column 'georgian_text' at row 1

I don't see anything in Tomcat logs, but I'm assuming that Aggregate would have gotten the same error each time a record was submitted.

My question is: can the hex in the first rows be converted to anything useful?


Solution

  • I was unable to recover what was lost, but the answer for the sake of documentation is to always have the following in your /etc/my.cnf from the start so these issues don't happen in the first place.

    character-set-client-handshake = FALSE
    character-set-server = utf8mb4
    collation-server = utf8mb4_unicode_ci
    init_connect='SET NAMES utf8mb4'