Search code examples
pythonpycharmspydertokenizesentiment-analysis

How can i tokenize all rows in a specific column from a csv file using Python?


I'm doing a sentiments analysis using Python (I'm still a rookie with this specific programming language). I have some Twitter data in a csv file that I need to pre-process before doing the real analysis. First of all I need to tokenize the text from a specific column, in my case the second or col B. I found some suggestions how to do the tokenization but not to pick the specific col. Anyone who has experience with this?

I tried this code, which seems to work for all columns, but how can I isolate it to the second col?

import csv
import nltk
from nltk import word_tokenize 

with open('TwitterData.csv', 'r') as csvfile:
   reader = csv.DictReader(csvfile)
   for row in reader:
       print(row)

Any suggestions to modules and code that works for pre-processing to sentiments analysis?

Thanks a lot!


Solution

  • I can highly recommend you the scikit-learn documentation and modules, especially the part about "Working with Text Data": https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html

    There they also have a section about sentiment analysis: https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html#exercise-2-sentiment-analysis-on-movie-reviews

    If you need more specific help with your code, it is alway best to provide a "minimal reproducable example": https://stackoverflow.com/help/minimal-reproducible-example This way, others can help you better with a specific issue you are facing.

    I hope that helps :)