Search code examples
python-2.7python-3.xnumpypandasiterator

Iterating operations over unique values of an array


I have a pandas dataframe that resembles one generated as follows.

import numpy as np
import pandas as pd

x0 = pd.DataFrame(np.random.normal(size=(10, 4)))
x1 = pd.DataFrame({'x': [1,1,2,3,2,3,4,1,2,3]})
df = pd.concat((x0, x1), axis=1)

and a function:

def fun(df, n=100):
    z = np.random.normal(size=n)    
    return np.dot(df[[0,1,2,3]], [0.5*z,-1*z,0.3*z,1.2*z])

I would like to:

  • use identical draws z for each unique value in x,
  • take the product of the output in the above step over items of unique x

Any suggestion?

Explanation:

  1. Generate n=100 draws to get z such that len(z)=100
  2. For each elem in z, evaluate the function fun,
  3. For i in df.x.unique(), compute the product of the output in step (2) element-wise. I am expecting to get a DataFrame or array of dimension (len(df.x.unique(), n=100)
  4. 4.

Solution

  • It sounds like you want to group by 'x', taking one of its instances (let's assume we take the first one observed).

    just call your function as follows:

    f = fun(df.groupby('x').first())
    
    >>> f.shape
    Out[25]: (4, 100)
    
    >>> len(df.x.unique())
    Out[26]:  4