Search code examples
pythonnul

Python - how to read file with NUL delimited lines?


I usually use the following Python code to read lines from a file :

f = open('./my.csv', 'r')
for line in f:
    print line

But how about if the file is line delimited by "\0" (not "\n") ? Is there a Python module that could handle this ?

Thanks for any advice.


Solution

  • If your file is small enough that you can read it all into memory you can use split:

    for line in f.read().split('\0'):
        print line
    

    Otherwise you might want to try this recipe from the discussion about this feature request:

    def fileLineIter(inputFile,
                     inputNewline="\n",
                     outputNewline=None,
                     readSize=8192):
       """Like the normal file iter but you can set what string indicates newline.
       
       The newline string can be arbitrarily long; it need not be restricted to a
       single character. You can also set the read size and control whether or not
       the newline string is left on the end of the iterated lines.  Setting
       newline to '\0' is particularly good for use with an input file created with
       something like "os.popen('find -print0')".
       """
       if outputNewline is None: outputNewline = inputNewline
       partialLine = ''
       while True:
           charsJustRead = inputFile.read(readSize)
           if not charsJustRead: break
           partialLine += charsJustRead
           lines = partialLine.split(inputNewline)
           partialLine = lines.pop()
           for line in lines: yield line + outputNewline
       if partialLine: yield partialLine
    

    I also noticed your file has a "csv" extension. There is a CSV module built into Python (import csv). There is an attribute called Dialect.lineterminator however it is currently not implemented in the reader:

    Dialect.lineterminator

    The string used to terminate lines produced by the writer. It defaults to '\r\n'.

    Note The reader is hard-coded to recognise either '\r' or '\n' as end-of-line, and ignores lineterminator. This behavior may change in the future.