I have a large dataset that I'd like to analyse in python with pandas. It's all contained in a .txt but the separator is +++$+++. How can I parse this?
import pandas as pd
df = pd.read_csv('filename.txt', sep='+++$+++', header=None)
These two lines bring up this error:
sre_constants.error: nothing to repeat
that's because if the separator is longer than 1 char it's interpreted as a regular expression, as stated in http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html; so the + indicat "any number of matches of the before char", which there isn't, so there's "nothing to repeat".
i think escaping the symbols might work.