Python tokenizing words

summaries = []
texts = []
with open("C:\\Users\\apandey\\Documents\\Reviews.csv","r",encoding="utf8") as csvfile: 
    reader = csv.reader(csvfile)
    for row in reader:
        clean_text = clean(row['Text'])
        clean_summary = clean(row['Summary'])
        summaries.append(word_tokenize(clean_summary))
        texts.append(word_tokenize(clean_text))

I just want to tokenize row from csv file and I am getting this error: "list indices must be integers or slices, not str"

Solution

I believe your csv file looks something like this:

Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text
1,'B001E4KFG0','A3SGXH7AUHU8GW','delmartian',1,1,5,1303862400,'Good Quality Dog 
Food','I have bought several of the Vitality canned dog food products and have 
found them all to be of good quality...'

Then you should use DictReader as suggested by Peter Wood in comment section.

summaries = []
texts = []
with open("foo.csv",encoding="utf8", newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        clean_text = row["Text"]
        clean_summary = row["Summary"]
        summaries.append(word_tokenize(clean_summary))
        texts.append(word_tokenize(clean_text))

Output:

# texts
[["'I", 'have', 'bought', 'several', 'of', 'the', 'Vitality', 'canned', 'dog', 'food', 'products', 'and', 'have', 'found', 'them', 'all', 'to', 'be', 'of', 'good', 'quality', '.', 'The', 'product', 'looks', 'more', 'like', 'a', 'stew', 'than', 'a', 'processed', 'meat', 'and', 'it', 'smells', 'better', '.', 'My', 'Labrador', 'is', 'finicky', 'and', 'she', 'appreciates', 'this', 'product', 'better', 'than', 'most', '.', "'"]]

# summaries
[["'Good", 'Quality', 'Dog', 'Food', "'"]]