Search code examples
pythonpandascsvtext-filesdelimiter

Avoiding parsing errors due to space character as delimiter + text in Python text files


If a textfile contains a character, say space, as both a delimiter and part of text, how should we read the file using pandas read_csv, read_table or file read?


Solution

  • You can use these array keys to gather elements dynamically, so the first 6 elements will be captured as you expect (Note that the line variable should be in some loop that iterates over every line in the file, assigning the line the to a variable named 'line'):

     elements = line.split(" ")
     int_fields = elements[:6]
     last_field = elements[6:]
    

    The last field will either be your last integer, if its a single integer. If it is a string, like the name you posted in your exapmple, last_field will be an array. You can then join them into a single variable by treating it as a string (which the other integers will be when you split that string):

    field = ""
    for item in last_field:
        field += "{} ".format(item)
    field.strip()
    

    That will add all the words into one string, including spaces. The strip function removes extra spaces at the end.

    Again though, using space delimited files is usually not a great approach, if you have access to whatever creates the files, change the delimiter to a comma or a pipe (|)