Search code examples
unicodeutf-8arraysrebolrebol3

How can I work with a single byte and binary! byte arrays in Rebol 3?


In Rebol 2, it is possible to use to char! to produce what is effectively a single byte, that you can use in operations on binaries such as append:

>> buffer: #{DECAFBAD}
>> data: #{FFAE}
>> append buffer (to char! (first data))
== #{DECAFBADFF}

Seems sensible. But in Rebol 3, you get something different:

>> append buffer (to char! (first data))
== #{DECAFBADC3BF}

That's because it doesn't model single characters as single bytes (due to Unicode). So the integer value of first data (255) is translated into a two-byte sequence:

>> to char! 255
== #"ÿ"
>> to binary! (to char! 255)
== #{C3BF}

Given that CHAR! is no longer equivalent to a byte in Rebol 3, and no BYTE! datatype was added (such that a BINARY! could be considered a series of these BYTE!s just as a STRING! is a series of CHAR!), what is one to do about this kind of situation?


Solution

  • Use an integer!, the closest match we have for expressing a byte in R3, at the moment.

    Note that integers are range-checked when used as bytes in context of a binary!:

    >> append #{} 1024
    ** Script error: value out of range: 1024
    ** Where: append
    ** Near: append #{} 1024
    

    For your first example, you actually append one element of one series to another series of the same type. In R3 you can express this in the obvious and most straight-forward way:

    >> append #{DECAFBAD} first #{FFAE}
    == #{DECAFBADFF}
    

    So for that matter, a binary! is a series of range-constrained integer!s.

    Unfortunately, that won't work in R2, because its binary! model was just broken in many tiny ways, including the above. While conceptually a binary! in R2 can be considered a series of char!s, that concept is not consistently implemented.