Search code examples
mysqlpt-online-schema-change

Ignore warnings with pt-online-schema-change


I'm trying to update a table that was working fine a minute ago, but now I've hit an error that I've got in the past that's extremely inconvenient.

09:32:57 Copying rows caused a MySQL error 1300:
    Level: Warning
     Code: 1300
  Message: Invalid utf8mb4 character string: '94C494'

Is there an option or something so that I can ignore these warnings? After all it is just a warning, so MySQL still has the ability to move forward and copy the rows anyway.

Other solutions would be fine, too, like a way to find the offending value in the table, or to remove the invalid pieces, or just something so I can alter my table would be amazing.

My Googling on how to find/replace broken utf8 sequences in MySQL bring to me to these links (that aren't so helpful)

I've even tried searching for all columns where the hex contains the invalid sequence and still somehow with no luck

select * from `notifications` 
where hex(`Description`) like '%94C494%' 
or hex(`Title`) like '%94C494%' 
or hex(`NotificationID`) like '%94C494%' 
or hex(`ToUserID`) like '%94C494%' 
or hex(`FromUserID`) like '%94C494%'
or hex(`Link`) like '%94C494%' 
or hex(`Icon`) like '%94C494%';

MySQL is version 5.7.18-15-57-log and pt-online-schema-change 3.0.8


Even stranger, I decided to humor myself and search ALL columns (not just the utf8mb4 ones) and I got rows! But the only rows I got were from my Binary columns? Why would having invalid utf8 sequences matter in a binary column? Now I think it might be a bug with the tool

select * ,hex(`notificationid`), hex(`notificationbid`), hex(`fromuserid`), hex(`touserid`), hex(`title`), hex(`description`), hex(`read`), hex(`datetimeadded`), hex(`link`), hex(`icon`), hex(`shown`), hex(`_linkdescriptionsha256`), hex(`_touseridlinkdescription+sha3-224`)
from `notifications` 
where hex(`notificationid`) like '%94C494%'
or hex(`notificationbid`) like '%94C494%'
or hex(`fromuserid`) like '%94C494%'
or hex(`touserid`) like '%94C494%'
or hex(`title`) like '%94C494%'
or hex(`description`) like '%94C494%'
or hex(`read`) like '%94C494%'
or hex(`datetimeadded`) like '%94C494%'
or hex(`link`) like '%94C494%'
or hex(`icon`) like '%94C494%'
or hex(`shown`) like '%94C494%'
or hex(`_linkdescriptionsha256`) like '%94C494%'
or hex(`_touseridlinkdescription+sha3-224`) like '%94C494%';

Solution

  • After working with Percona support on this issue, we eventually led to the creation of this ticket: https://jira.percona.com/browse/PT-1528

    The trick for ignoring collation issues (or working around this bug) is by adding the --charset binary flag to the pt-online-schema-change command.

    Seems to happen when the primary key is a binary column and the charset is either set to utf8(mb4) or inferred as one of those from the MySQL settings.