I ran into some trouble while creating a C-Extension for ruby that got me thinking. I wonder how Ruby (1.9.1) handles strings (and all the encoding-stuff) internally?
If I have a string like "o"
, and I pass the string to a C-Function (as VALUE
), I can deal with it pretty easily using the RSTRING_PTR()
and the RSTRING_LEN()
macro. However, if I make the string ö
(a german umlaut character), RSTRING_LEN()
will give me 2
.
I'm a bit stumped on the contents of RSTRING_PTR()
in that case, the two bytes are 0xA4
and 0xC3
. What encoding is this? I tried using "ö".force_encoding( ... )
with different encodings before passing the string to the C-function, but that does not affect the contents of RSTRING_PTR
at all.
What I need is a way to have the string represented as a WCHAR*
encoded in UTF-16
(in the case of "ö"
, that would be 0x00F6
) in my C-function, but that's kinda hard to do if you do not know what encoding you're coming from...
thx for any help in advance
String internals in ruby 1.9 depends on __ENCODING__
constant and Encoding.default_internal
setting.
In your case it looks like UTF-8 (default), but ö
is actually c3 b6
in UTF-8, and c3 a4
is ä