Search code examples
rnul

How to read a text file containing NUL characters?


I have a file that contains NUL characters.

This file is generated by another program I have no control over, but I have to read it in order to get some crucial information.

Unfortunately, readChar() truncates the output with this warning:

In readChar("output.txt", 1e+05) :   
  truncating string with embedded nuls

Is there a way around this problem?


Solution

  • By convention, a text file cannot contain non-printable characters (including NUL). If a file contains such characters, it isn’t a text file — it’s a binary file.

    R strictly1 adheres to this convention, and completely disallows NUL characters. You really need to read and treat the data as binary data. This means using readBin and the raw data type:

    n = file.size(filename)
    buffer = readBin(filename, 'raw', n = n)
    # Unfortunately the above has a race condition, so check that the size hasn’t changed!
    stopifnot(n == file.size(filename))
    

    Now we can fix the buffer by removing embedded zero bytes. This assumes UTF-x or ASCII encoding! Other encodings might have embedded zero bytes that need to be interpreted!

    buffer = buffer[buffer != 0L]
    text = rawToChar(buffer)
    

    1 Maybe too strictly …