Search code examples
pythonseparatorlidar

How to change the field separator of a file using Python?


I'm new to Python from the R world, and I'm working on big text files, structured in data columns (this is LiDaR data, so generally 60 million + records).

Is it possible to change the field separator (eg from tab-delimited to comma-delimited) of such a big file without having to read the file and do a for loop on the lines?


Solution

  • No.

    • Read the file in
    • Change separators for each line
    • Write each line back

    This is easily doable with just a few lines of Python (not tested but the general approach works):

    # Python - it's so readable, the code basically just writes itself ;-)
    #
    with open('infile') as infile:
      with open('outfile', 'w') as outfile:
        for line in infile:
          fields = line.split('\t')
          outfile.write(','.join(fields))
    

    I'm not familiar with R, but if it has a library function for this it's probably doing exactly the same thing.

    Note that this code only reads one line at a time from the file, so the file can be larger than the physical RAM - it's never wholly loaded in.