Search code examples
pythonnltktokenize

error tokenizing with nltk from array data in file excel sequence item 0: expected str instance, list found


I have a problem in this code, maybe anybody help, the data train['hadis'] from the text in excel, was success showing

train['hadis'] = train['hadis'].apply(lambda x: " ".join([nltk.tokenize.word_tokenize(x) for x in x.split()]))
train['hadis'].head()

TypeError: sequence item 0: expected str instance, list found

result for tokenizing every each row data


Solution

  • Instead of

    lambda x: " ".join([nltk.tokenize.word_tokenize(x) for x in x.split()])
    

    Use

    lambda x: " ".join(nltk.tokenize.word_tokenize(x))