Search code examples
filebinaryrediscomputer-science

Binary Safety redis


In the Redis Protocol specification, it mentions that:

"Status replies are not binary safe and can't include newlines." What does it mean for a string/file to be binary safe? And why can't status replies in redis be binary safe?


Solution

  • Correction as of 26NOV24

    The term "binary string" or "binary data" is a common misnomer. Binary is a numerical base, and has nothing directly to do with the data you are dealing with.

    I'm going to reference this video, which was published on YouTube by Redis to explain:

    https://www.youtube.com/watch?v=7CUt4yWeRQE

    The video explains that binary strings in Redis are strings that can contain any type of data, with no limitations on the value of each element in the string. The example they give is that you can place image data directly into the string.

    In a "non-binary-safe" string, or a string that is limited to only ASCII characters or UTF-8 sequences, the alternative would be to re-encode the image data (or "binary-data") in base-64, which allows it to be represented with only ASCII characters.

    So the definition for a "binary safe string" according to Redius would be a sequence of byte values, where each byte has a domain of [0, 255]. The number of elements in the sequence is known to Redis because the number of elements is sent before the rest of the string data.

    I would not consider this to be a universal definition, always verify what someone is trying to say when they talk about "binary data", because it is often misinterpreted.

    In Python, for example, they would be called bytes objects and written with a b prefix:

    example = b'SET foo \x00'
    

    In C it would be:

    char example[] = "SET foo \x00";
    

    Unfortunately, this does not help precisely define what a "non-binary safe string" is.

    Based on my understanding of their protocol and the fact that they're mainly used for responses, you can probably assume that it is ASCII-only characters, not including carriage return or line feed characters.

    Previous Answer

    A binary safe string parser accounts for all possible values 0 - 255 in a single character within a string, the string is probably not null terminated (it's length is known otherwise). If a string parser isn't binary safe, it's expecting a null terminated string (a binary 0 at the of the string).

    Usually, string parser are not binary safe. Many parses expect normal printable characters and a 0 at the end of a string. If there is not a 0 at the end of this kind of string, there could easily be a segmentation fault.

    Binary safe parsers are probably parsing arbitrary data (may be text or something else).

    Edit:

    "What does it mean for a string/file to be binary safe?"

    • It's the text parser that is binary safe, not the string/file itself. However, if a string is called binary safe, I would suspect it means that it is a null-terminated string with standard ascii characters.

    "And why can't status replies in redis be binary safe?"

    • Because the parser implementation that checks replies ends at the first instance of \r\n. This is how the parser figures out the length of the string. So if it's finds a \r\n before the end of the reply, it stops parsing and disregards everything afterwards.

    Unless status replies need to send binary data, there would be no need for them to be binary safe.