Search code examples
pythonpandasdataframevectorizationstring-length

pandas vectorized operation to get the length of string


I have a pandas dataframe.

df = pd.DataFrame(['Donald Dump','Make America Great Again!','Donald Shrimp'],
                   columns=['text'])

What I like to have is another column in Dataframe which has the length of the strings in the 'text' column.

For above example, it would be

                        text  text_length
0                Donald Dump           11
1  Make America Great Again!           25
2              Donald Shrimp           13

I know I can loop through it and get the length but is there any way to vectorize this operation? I have few million rows.


Solution

  • I think the easiest way is to use the apply method of the DataFrame. With this method you can manipulate the data any way you want.

    You could do something like:

    df['text_ength'] = df['text'].apply(len)
    

    to create a new column with the data you want.


    Edit After seeing @jezrael answer I was curious and decided to timeit. I created a DataFrame full with lorem ipsum sentences (101000 rows) and the difference is quite small. For me I got:

    In [59]: %timeit df['text_length'] = (df.text.str.len())
    10 loops, best of 3: 20.6 ms per loop
    
    In [60]: %timeit df['text_length'] = df['text'].apply(len)
    100 loops, best of 3: 17.6 ms per loop