python pandas list dataframe collections

Python get num occurrences of elements in each of several lists

I have a 4 corpuses:

C1 = ['hello','good','good','desk']
C2 = ['nice','good','desk','paper']
C3 = ['red','blue','green']
C4 = ['good']

I want to define a list of words, and for each - get the occurances per corpus. so if

l= ['good','blue']

I will get

res_df =  word. C1. C2. C3. C4
          good.  2. 1.  0.   1
          blue.  0. 0.  1.   0

My corpus is very large so I am looking for efficient way. What is the best way to do this?

Thanks

Solution

One idea is filter values by list converted to set and then count by Counter, last pass to DataFrame with add 0 and integers:

from collections import Counter

d = {'C1':C1, 'C2':C2, 'C3':C3, 'C4':C4}

s = set(l)     

df = (pd.DataFrame({k:Counter([y for y in v if y in s]) for k, v in d.items()})
        .fillna(0).astype(int))
print (df)
      C1  C2  C3  C4
good   2   1   0   1
blue   0   0   1   0

If possible not existing values in list:

from collections import Counter

l= ['good','blue','non']

d = {'C1':C1, 'C2':C2, 'C3':C3, 'C4':C4}

s = set(l)     

df = (pd.DataFrame({k:Counter([y for y in v if y in s]) for k, v in d.items()})
        .fillna(0)
        .astype(int)
        .reindex(l, fill_value=0))
print (df)
    
      C1  C2  C3  C4
good   2   1   0   1
blue   0   0   1   0
non    0   0   0   0