I am trying to split Comments column into multiple rows containing each sentence. I used the following StackOverflow thread for my reference as it tends to give similar result. Reference Link: pandas: How do I split text in a column into multiple rows? Sample data of dataframe is as below.
Id Team Food_Text 1 X Food is good. It is very well cooked. Delicious! 2 X I hate the squid. Food is not cooked well. At all indeed. 3 X Please do not good anytime over here 4 Y I love the fish. Awesome delicacy. 5 Y Good for desserts. Meat tastes bad
Each record for 'Food_Text' can be of multiple sentences delimited by full-stop or period. I have used the following code
import numpy as np
import pandas as pd
survey_data = pd.read_csv("Food_Dummy.csv")
survey_text = survey_data[['Id','Team','Food_Text']]
# Getting s as pandas series which has split on full stop and new sentence a new line
s = survey_text["Food_Text"].str.split('.').apply(pd.Series,1).stack()
s.index = s.index.droplevel(-1) # to line up with df's index
s.name = 'Food_Text' # needs a name to join
# There are blank or emplty cell values after above process. Removing them
s.replace('', np.nan, inplace=True)
s.dropna(inplace=True)
x=s.to_frame(name='Food_Text1')
x.head(10)
# Joining should ideally get me proper output. But I am getting original dataframe instead of split one.
survey_text.join(x)
survey_text.head(10)
I am not sure why the join is not giving me a proper dataframe with more number of rows. Repetition of other columns based on index of split. So Id=1 has 3 sentences so we should have 3 records with all other data same and Food_Text column with a new sentence from a comment by ID=1. Similarly for other records.
Thank You in advance for your help! Regards, Sohil Shah
In the example that you put in your code, The result of the join
was printed, so if you want to change the value of your survey_text, the code should be:
survey_text = survey_text.join(x)
or if you wanted to simplify your code, this code below is just fine:
import numpy as np
import pandas as pd
survey_data = pd.read_csv("Food_Dummy.csv")
survey_text = survey_data[['Id','Team','Food_Text']]
# Getting s as pandas series which has split on full stop and new sentence a new line
s = survey_text["Food_Text"].str.split('.').apply(pd.Series,1).stack()
s.index = s.index.droplevel(-1) # to line up with df's index
s.name = 'Food_Text' # needs a name to join
# There are blank or emplty cell values after above process. Removing them
s.replace('', np.nan, inplace=True)
s.dropna(inplace=True)
# Joining should ideally get me proper output. But I am getting original dataframe instead of split one.
del survey_text['Food_Text']
survey_text = survey_text.join(s)
survey_text.head(10)
This way you will not have multiple "Food_Text" columns in yout DataFrame.