Search code examples
pythonnlp

Count the total number of characters of a text dataset


I have a dataset in a dataframe format which the first colum contains text and in the second on contais labels. I want to cound the total number of characters of my dataset. I implemented a code for the total number of words, but I can not adopt it to characters. I would be grateful if you could help me.

# To see the total number of words 
dt['text'].apply(lambda x: len(x.split(' '))).sum()

Solution

  • dt['text'].str.len().sum()
    

    This would give you the total number of characters. You can check the doc for str from here (Vectorized string functions for Series and Index.)