This question provided a example of how a frequency count for a given row can be obtained from pandas dataframe using pd.get_dummies + aggregation. However this doesn't scale if you want only a small subset of terms from a very large dataframe.
For example consider the same example:
import pandas as pd
df = pd.DataFrame({'ID': ['xyz_1', 'xyz_2', 'xyz_3', 'xyz_4', 'xyz_400'],
'class1': ['yes_1', 'no_2', pd.NA, 'no_3', 'no_7'],
'class2': ['no_8', 'yes_15', 'yes_16', 'no_18', 'no_21'],
'class3': [pd.NA, 'no_51', 'yes_1', 'no_3', 'no_4'],
'class100': ['yes_3', 'no_5', pd.NA, 'yes_6', 'no_7']})
ID class1 class2 class3 class100
0 xyz_1 yes_1 no_8 <NA> yes_3
1 xyz_2 no_2 yes_15 no_51 no_5
2 xyz_3 <NA> yes_16 yes_1 <NA>
3 xyz_4 no_3 no_18 no_3 yes_6
4 xyz_400 no_7 no_21 no_4 no_7
Instead of the values being in the set of yes and no they can be one of many different categorical variables. If you just wanted frequency terms for yes_1, no_51 this is a ton of extra computation.
The best solution I've found so far is pre-processing the other values into NANs
set = ['yes_1', 'no_51']
df[~df.isin(set)] = pd.NA
ID class1 class2 class3 class100
0 <NA> yes_1 <NA> <NA> <NA>
1 <NA> <NA> <NA> no_51 <NA>
2 <NA> <NA> <NA> yes_1 <NA>
3 <NA> <NA> <NA> <NA> <NA>
4 <NA> <NA> <NA> <NA> <NA>
This is still glacially slow for large dataframes on the order of 1 million entries. Is there a way to scale this better.
In the linked question, a lot more performant solution would be:
df.apply(lambda row: row.value_counts(dropna=False), axis=1).fillna(0)
This might be already sufficient for your purposes; however, in case you're after only a couple values, it might be faster to just:
counts = pd.Series({(df == key).values.sum() for key in ['yes_1', 'no_51']})