Search code examples
pythonregexpandasseparator

How can I parse a .txt with a delimiter that has multiple characters into a pandas df?


I have a large dataset that I'd like to analyse in python with pandas. It's all contained in a .txt but the separator is +++$+++. How can I parse this?

import pandas as pd
df = pd.read_csv('filename.txt', sep='+++$+++', header=None)

These two lines bring up this error:

sre_constants.error: nothing to repeat

Solution

  • that's because if the separator is longer than 1 char it's interpreted as a regular expression, as stated in http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html; so the + indicat "any number of matches of the before char", which there isn't, so there's "nothing to repeat".

    i think escaping the symbols might work.