Search code examples
pythonpandasdataframedata-science

Apply a multiplication to random elements on my dataframe


So what I want is to randomly chose a given amount of elements on my dataframe, and to those elements, apply an operation (which will be a multiplication by a number which will also be randomly chosen between a range) to an 'eta' column. I'm stuck on achieving this.

atm I've randomly gotten a list of index within my dataframe but what I dont know is how to only apply the multiplication to the elements with those index and not the rest.

Some help would be trully appreciated!

numExtreme = 50 #Amount of registres to apply the random modification.
mod_indices = np.random.choice(df.index, numExtreme, replace=False) #set of "numExtreme" indices randomly chosen.

Solution

  • Use DataFrame.loc for select random indices and only column eta and multiple by random array between 1 and 50:

    np.random.seed(123)
    df = pd.DataFrame({'eta':np.random.randint(10, size=10),
                       'another':np.random.randint(10, size=10)})
    print (df)
       eta  another
    0    2        9
    1    2        0
    2    6        0
    3    1        9
    4    3        3
    5    9        4
    6    6        0
    7    1        0
    8    0        4
    9    1        1
    
    numExtreme = 5 #Amount of registres to apply the random modification.
    mod_indices = np.random.choice(df.index, numExtreme, replace=False)
    
    print (mod_indices)
    [5 1 0 8 6]
    
    arr = np.random.randint(1, 50, len(mod_indices))
    print (arr)
    [49  8 42 36 29]
    

    df.loc[mod_indices, 'eta'] *= arr
    print (df)
       eta  another
    0   84        9
    1   16        0
    2    6        0
    3    1        9
    4    3        3
    5  441        4
    6  174        0
    7    1        0
    8    0        4
    9    1        1