Search code examples
pythonpandasrandomrandom-seed

How to create a column with randomly generated values in a pandas dataframe


I want to assign a random number (in decimal from 0 to 1) to a column that contains unique value within a Pandas dataframe.

Below is a dataframe with unique value of "region", I want to create a new column with a unique randomly generated decimal number (between 0 to 1) corresponds to each region.

I used random function to generator random number, but I couldn't figure out how to assign these random numbers to each region and make it a new column. The goal also includes making sure the random number assigned to each region doesn't change in case of a re-run, so I set a seed. Any advice would be greatly appreciated.

import pandas as pd
import numpy as np
import random

region= 'region'
product1 = 'product1'
product2 = 'product2'
product3 = 'product3'
product4 = 'product4'
product5 = 'product5'

list_reg = ['region1', 'region2','region3','region4','region5','region6']
list_score1 = [100, 250, 350, 555,999999, 200000]
list_score2 = [41, 111, 12.14,16.18,np.nan,200003]
list_score3 = [7.04, 2.09, 11.14,2000320,22.17,np.nan]
list_score4 = [236,249,400,0.56,359,122]
list_score5 = [None, 1.33, 2.54, 1, 0.9, 3.2]

df_test = pd.DataFrame({region: list_reg,
                    product1: list_score1,   
                    product2: list_score2, 
                    product3: list_score3, 
                    product4: list_score4, 
                    product5: list_score5})

random.seed(123) # in case of a re-run, make sure the randomly generated number doesn't change 
random_genator = random.uniform(0.0001, 1.0000)

The desired goal would be something like below enter image description here


Solution

  • To add the column to an existing DF, you can generate a list of the correct size using a comprehension:

    df_test['random_genator'] = [random.uniform(0.0001, 1.0000) for _ in range(len(list_reg))]
    

    which gives (for example):

        region  product1   product2    product3  product4  product5  random_genator
    0  region1       100      41.00        7.04    236.00       NaN        0.052458
    1  region2       250     111.00        2.09    249.00      1.33        0.087278
    2  region3       350      12.14       11.14    400.00      2.54        0.407301
    3  region4       555      16.18  2000320.00      0.56      1.00        0.107789
    4  region5    999999        NaN       22.17    359.00      0.90        0.901209
    5  region6    200000  200003.00         NaN    122.00      3.20        0.038250