Split dataframe column into two columns based on delimiter

I am preprocessing text for classification, and I import my dataset like this:

dataset = pd.read_csv('lyrics.csv', delimiter = '\t', quoting = 2)

dataset prints on terminal:

                                 lyrics,classification
0    I should have known better with a girl like yo...
1    You can shake an apple off an apple tree\nShak...
2    It's been a hard day's night\nAnd I've been wo...
3    Michelle, ma belle\nThese are words that go to...

however, when I inspect the variable dataset closer using spyder, I see that I have only one column, instead of the desired two columns.

considering that lyrics themselves have commas and "," delimiter would not work,

how do I correct my dataframe above in order to have:

1) one column for lyrics

2) one column for classification

with correspondent data for each row?

Solution

If your lyrics themselves do not contain commas (they most likely do), then you can use read_csv with delimiter=','.

However, if that is not an option, you could use str.rsplit:

dataset.iloc[:, 0].str.rsplit(',', expand=True)

df

                               lyrics,classification
0  I should have known better with a girl like yo...
1                              You can shake an...,0
2                  It's been a hard day's night...,0

df = df.iloc[:, 0].str.rsplit(',', 1, expand=True)
df.columns = ['lyrics', 'classification']
df

                                              lyrics classification
0  I should have known better with a girl like yo...              0
1                                You can shake an...              0
2                    It's been a hard day's night...              0