I have some product names that include unicode characters
⚠️📷PLEASE READ! WORKING KODAK DC215 ZOOM 1.0MP DIGITAL CAMERA - UK SELLER
A query in heidiSQL shows it fine
I setup MariaDB new this morning having moved from MySQL, but when records are retrieved through a ColdFusion Query using the MariaDB JDBC I get
java.lang.StringIndexOutOfBoundsException: begin 0, end 80, length 74
at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3410)
at java.base/java.lang.String.substring(String.java:1883)
at org.mariadb.jdbc.internal.com.read.resultset.rowprotocol.TextRowProtocol.getInternalString(TextRowProtocol.java:238)
at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.getString(SelectResultSet.java:948)
The productname field collation is utf8mb4_unicode_520_ci
, I've tried a few options. I've tried to set this at table and database level where it let me.
The JDBC connection string in ColdFusion admin is jdbc:mysql://localhost:3307/usedlens?useUnicode=true&characterEncoding=UTF-8
I note that the live production database where MariaDB was used from the beginning I don't have this trouble but the default charset is latin1, and the same record is the database as
????PLEASE READ! WORKING KODAK DC215 ZOOM 1.0MP DIGITAL CAMERA - UK SELLER
Here's how we've been stripping high ASCII characters while retaining any characters that may be salvaged:
string function ASCIINormalize(string inputString=""){
return createObject( 'java', 'java.text.Normalizer' ).normalize( javacast("string", arguments.inputString) , createObject( 'java', 'java.text.Normalizer$Form' ).valueOf('NFD') ).replaceAll('\p{InCombiningDiacriticalMarks}+','').replaceAll('[^\p{ASCII}]+','');
}
productname = ASCIINormalize(productname);
/*
Comparisons using java UDF versus reReplace regex:
"ABC Café ’test" (note: High ASCII non-normal whitespace characters used.)
ASCIINormalize = "ABC Cafe test"
reReplace = "ABC Caf test"
"čeština"
ASCIINormalize = "cestina"
reReplace = "etina"
"Häuser Bäume Höfe Gärten"
ASCIINormalize = "Hauser Baume Hofe Garten"
reReplace = "Huser Bume Hfe Grten"
*/