Search code examples
rubystringencodingutf-8utf-16

Ruby decompose UTF-8 chars


I achieved to get the UTF-16 code of "ü" using

#!/bin/env ruby
# encoding: UTF-8

puts "ü".unpack('U*')

Well, it just returns 252 which is fine. I read the online doc for ruby String but I don't get it how to decompose this character.

In case of ü I want to get the character u (117) and ¨ (168)

Thanks in advance, I appreciate any help


Solution

  • String#unpack and Array#pack are, as ForeverZer0 mentioned in the comments, for decoding binary strings into more structured data (such as numbers) and encoding data into strings (respectively). If you want to decompose unicode, you want String#unicode_normalize and the NFD form:

    > "ü".unicode_normalize(:nfd).chars
     => ["u", "̈"] 
    

    That gives you 117 and 776, not 168. 168 is ¨ in ISO-8859-1 not UTF-8.