I am trying to perform a sentiment analysis using a Bayesian Classifier and I have a CSV file consisting of rows with the following structure:
Column 1: Either 1 or 0
Column 2: String
Example: 1 | This is a great movie
I am using Pandas when reading the CSV file (read_csv).
After reading each row from the CSV file has the following structure:
1;This is a great movie
0;This is a bad movie
I would like to tokenize each string in column 2. However, I have not managed to do this. How do I tackle this problem?
Assuming the df looks like (just replace column name from 0 to column_name
which you have as header:
0
0 1;This is a great movie
1 0;This is a bad movie
pd.DataFrame(df[0].apply(lambda x: x.split(";")).values.tolist(),columns=['A','B'])
A B
0 1 This is a great movie
1 0 This is a bad movie