I'm used to make some analysis from text files in Python. I usually do something like:
f = open('filename.txt','r')
text = ""
while 1:
line = f.readline()
if not line:break
text += line
f.close()
# tokenize
tokenized_word=word_tokenize(text)
.
.
.
However, now I'm not working with a text file, but with a Pandas dataframe. How can I get the 'text' object from a Pandas column?
I tried taking a look at the post Text mining with Python and pandas, but it's not exactly what I'm looking for.
Let's imagine this is your datafame:
import pandas as pd
df = pd.DataFrame({ "Text": ['bla bla bla', 'Hello', 'Other sentence', 'Lets see']})
You can get the synonym to your code by using the agg
function:
text = df['Text'].agg(lambda x: ' '.join(x.dropna()))
text
Result:
'bla bla bla Hello Other sentence Lets see'
Then you can tokenize:
tokenized_word=word_tokenize(text)