While debugging my Rails App I found the following messages in the log file:
(0.1ms) ROLLBACK
Completed 500 Internal Server Error in 25ms (ActiveRecord: 4.2ms)
ActiveRecord::StatementInvalid (Mysql2::Error: Incorrect string value: '\xF0\x9F\x98\x89 u...' for column 'description' at row 1: INSERT INTO `course` (`title`, `description`) VALUES ('sometitle', '<p>Description containing 😉 and stuff</p>')
This seems to stem from my database being MySQL with not-quite-utf-8:
CREATE TABLE `course` (
`id` int NOT NULL AUTO_INCREMENT,
`title` varchar(250) DEFAULT NULL,
`description` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2080 DEFAULT CHARSET=utf8;
According to the answers to this question CHARSET=utf8 is only capable of handling 3 byte characters, not 4 byte characters.
The Emoticon 😉 needs four bytes - see \xF0\x9F\x98\x89 in the log file.
I am wary of converting the whole database. I would rather forbid the use of emoticons and other 4 byte characters - they are really not necessary on my site.
What is the best way to do this in Rails?
Building on the regular expressions from these answers I implemented a validator:
# file /lib/three_byte_validator.rb
# forbid characters that use more than three byte
class ThreeByteValidator < ActiveModel::EachValidator
def validate_each(record, attribute, value)
if value =~ /[\u{10000}-\u{10FFFF}]/
record.errors.add attribute, (options[:message] || 'Keine Emoticons, keine UTF-8 Zeichen mit 4 Byte')
end
end
end
Now I can use this validator on the model that first had the problem:
class Course < ApplicationRecord
validates :title, length: { in: 3..100 }, three_byte: true
validates :description, length: { minimum: 50 }, three_byte: true
and also on other models:
class Person < ApplicationRecord
validates :firstname, :lastname, :country, :city, :address, three_byte: true