Search code examples
pythonpandastext-processing

Making a feature matrix from frequency of letters in a DF cell (split string into list of characters in a DF and count)


I want to count the occurrence of each letter in a string in a DF row and add the count to a new DF with 26 columns.

The rows of this new DF would be the index of the original DF.

I have looked at the list function and also using list comprehension. I am able to split a string into a list of characters. However, I am unable to reach the correct syntax for applying these functions to a DF column.

string = 'this is a string'
lst = []

for letter in string:
   lst.append(letter)

and also

lst = list(string)

I feel that it is using the apply function and perhaps a lambda? I have had a search of the site and it has revealed little. I think that perhaps I am looking for the wrong thing as I am sure that this has been done before!


Solution

  • You can try like this:

    for i in range(ord('a'), ord('z') + 1):
        ch = chr(i)
        df[ch] = df['your_column_name'].apply(lambda x : x.count(ch))