Search code examples
rubybase64cyrillic

Convert cyrillic text to Base64 and back


I tried to encode and decode cyrillic text to and from Base64:

cyrillic_text = "Какой-то русский текст"
base64 = Base64.encode64 cyrillic_text
inverse = Base64.decode64 base64

The result is:

"\xD0\x9A\xD0\xB0\xD0\xBA\xD0\xBE\xD0\xB9-\xD1\x82\xD0\xBE \xD1\x80\xD1\x83\xD1\x81\xD1\x81\xD0\xBA\xD0\xB8\xD0\xB9 \xD1\x82\xD0\xB5\xD0\xBA\xD1\x81\xD1\x82"

Why? How can I convert the encoded text back to the initial state?

inverse.encode "UTF-8"
Encoding::UndefinedConversionError: "\xD0" from ASCII-8BIT to UTF-8
    from (irb):93:in `encode'
    from (irb):93
    from /home/alexk/rubystack-2.2.7-2/ruby/bin/irb:11:in `<main>'

Solution

  • cyrillic_text = "Какой-то русский текст"
    base64 = Base64.encode64 cyrillic_text
    inverse = Base64.decode64(base64).force_encoding(Encoding::UTF_8)
    #⇒ "Какой-то русский текст"
    

    After the Base64 decoding process, it’s just a sequence of bytes and Ruby has no clue how to interpret it. One should explicitly instruct Ruby to use UTF8 encoding, since it was originally UTF8-encoded string.

    Your original attempt with String#encode didn’t work because it is indeed UTF-8 encoded string already, calling encode again screws it up. The only need is to tell Ruby it’s UTF8.