Search code examples
mysqlruby-on-railsvalidationutf-8utf8mb4

"Incorrect string value" when trying to insert UTF-8 into MySQL from Rails


While debugging my Rails App I found the following messages in the log file:

(0.1ms)  ROLLBACK
Completed 500 Internal Server Error in 25ms (ActiveRecord: 4.2ms)
ActiveRecord::StatementInvalid (Mysql2::Error: Incorrect string value: '\xF0\x9F\x98\x89 u...' for column 'description' at row 1: INSERT INTO `course` (`title`, `description`) VALUES ('sometitle', '<p>Description containing 😉 and stuff</p>')

This seems to stem from my database being MySQL with not-quite-utf-8:

CREATE TABLE `course` (
  `id` int NOT NULL AUTO_INCREMENT,
  `title` varchar(250) DEFAULT NULL,
  `description` text,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2080 DEFAULT CHARSET=utf8;

According to the answers to this question CHARSET=utf8 is only capable of handling 3 byte characters, not 4 byte characters.

The Emoticon 😉 needs four bytes - see \xF0\x9F\x98\x89 in the log file.

I am wary of converting the whole database. I would rather forbid the use of emoticons and other 4 byte characters - they are really not necessary on my site.

What is the best way to do this in Rails?


Solution

  • Building on the regular expressions from these answers I implemented a validator:

    # file /lib/three_byte_validator.rb
    # forbid characters that use more than three byte
    class ThreeByteValidator < ActiveModel::EachValidator
      def validate_each(record, attribute, value)
        if value =~ /[\u{10000}-\u{10FFFF}]/
          record.errors.add attribute, (options[:message] || 'Keine Emoticons, keine UTF-8 Zeichen mit 4 Byte')
        end
      end
    end
    

    Now I can use this validator on the model that first had the problem:

    class Course < ApplicationRecord
      validates :title, length: { in: 3..100 }, three_byte: true
      validates :description, length: { minimum: 50 }, three_byte: true
    

    and also on other models:

    class Person < ApplicationRecord
      validates :firstname, :lastname, :country, :city, :address, three_byte: true