Search code examples
pythonpandasdataframesplitdelimiter

Split dataframe column into two columns based on delimiter


I am preprocessing text for classification, and I import my dataset like this:

dataset = pd.read_csv('lyrics.csv', delimiter = '\t', quoting = 2)

dataset prints on terminal:

                                 lyrics,classification
0    I should have known better with a girl like yo...
1    You can shake an apple off an apple tree\nShak...
2    It's been a hard day's night\nAnd I've been wo...
3    Michelle, ma belle\nThese are words that go to...

however, when I inspect the variable dataset closer using spyder, I see that I have only one column, instead of the desired two columns.

enter image description here

considering that lyrics themselves have commas and "," delimiter would not work,

how do I correct my dataframe above in order to have:

1) one column for lyrics

2) one column for classification

with correspondent data for each row?


Solution

  • If your lyrics themselves do not contain commas (they most likely do), then you can use read_csv with delimiter=','.

    However, if that is not an option, you could use str.rsplit:

    dataset.iloc[:, 0].str.rsplit(',', expand=True)
    

    df
    
                                   lyrics,classification
    0  I should have known better with a girl like yo...
    1                              You can shake an...,0
    2                  It's been a hard day's night...,0
    
    df = df.iloc[:, 0].str.rsplit(',', 1, expand=True)
    df.columns = ['lyrics', 'classification']
    df
    
                                                  lyrics classification
    0  I should have known better with a girl like yo...              0
    1                                You can shake an...              0
    2                    It's been a hard day's night...              0