Search code examples
pythonpandasuniquerolling-computation

rolling unique value count in pandas across multiple columns


there are several answers around rolling count in pandas Rolling unique value count in pandas How to efficiently compute a rolling unique count in a pandas time series?

How do I count unique values across multiple columns? For one column, I can do:

df[my_col]=df[my_col].rolling(300).apply(lambda x: len(np.unique(x)))

How to extend to multipe columns, counting unique values overall across all values in the rolling window?


Solution

  • Inside a list comprehension iterate over the rolling windows and for each window flatten the values in required columns then use set to get the distinct elements

    cols = [...] # define your cols here
    df['count'] = [len(set(w[cols].values.ravel())) for w in df.rolling(300)]