I want to assign a random number (in decimal from 0 to 1) to a column that contains unique value within a Pandas dataframe.
Below is a dataframe with unique value of "region", I want to create a new column with a unique randomly generated decimal number (between 0 to 1) corresponds to each region.
I used random function to generator random number, but I couldn't figure out how to assign these random numbers to each region and make it a new column. The goal also includes making sure the random number assigned to each region doesn't change in case of a re-run, so I set a seed. Any advice would be greatly appreciated.
import pandas as pd
import numpy as np
import random
region= 'region'
product1 = 'product1'
product2 = 'product2'
product3 = 'product3'
product4 = 'product4'
product5 = 'product5'
list_reg = ['region1', 'region2','region3','region4','region5','region6']
list_score1 = [100, 250, 350, 555,999999, 200000]
list_score2 = [41, 111, 12.14,16.18,np.nan,200003]
list_score3 = [7.04, 2.09, 11.14,2000320,22.17,np.nan]
list_score4 = [236,249,400,0.56,359,122]
list_score5 = [None, 1.33, 2.54, 1, 0.9, 3.2]
df_test = pd.DataFrame({region: list_reg,
product1: list_score1,
product2: list_score2,
product3: list_score3,
product4: list_score4,
product5: list_score5})
random.seed(123) # in case of a re-run, make sure the randomly generated number doesn't change
random_genator = random.uniform(0.0001, 1.0000)
To add the column to an existing DF, you can generate a list of the correct size using a comprehension:
df_test['random_genator'] = [random.uniform(0.0001, 1.0000) for _ in range(len(list_reg))]
which gives (for example):
region product1 product2 product3 product4 product5 random_genator
0 region1 100 41.00 7.04 236.00 NaN 0.052458
1 region2 250 111.00 2.09 249.00 1.33 0.087278
2 region3 350 12.14 11.14 400.00 2.54 0.407301
3 region4 555 16.18 2000320.00 0.56 1.00 0.107789
4 region5 999999 NaN 22.17 359.00 0.90 0.901209
5 region6 200000 200003.00 NaN 122.00 3.20 0.038250