How do I count the number of occurrences of each word in a .txt file and also load it into the pandas dataframe with columns name and count, also sort the dataframe on column count?
Use nltk
:
# pip install nltk
from nltk.tokenize import RegexpTokenizer
from nltk import FreqDist
import pandas as pd
text = """How do I count the number of occurrences of each word in a .txt file and also load it into the pandas dataframe with columns name and count, also sort the dataframe on column count?"""
tokenizer = RegexpTokenizer(r'\w+')
words = tokenizer.tokenize(text)
sr = pd.Series(FreqDist(words))
Output:
>>> sr
How 1
do 1
I 1
count 3
the 3
number 1
of 2
occurrences 1
each 1
word 1
in 1
a 1
txt 1
file 1
and 2
also 2
load 1
it 1
into 1
pandas 1
dataframe 2
with 1
columns 1
name 1
sort 1
on 1
column 1
dtype: int64