I want to assign a random float (from 0 to 1) to a column that contains unique value within a Pandas dataframe.
Below is a dataframe with unique value of "region"; I want to create a new column with a unique randomly generated float (between 0 to 1) corresponds to each region.
I used random function to generate a random number, but I couldn't figure out how to assign these random numbers to each region and make it a new column.
The goal also includes making sure the random number assigned to each region doesn't change in case of a re-run, so I set a seed.
import pandas as pd
import numpy as np
import random
list_reg = ['region1', 'region2', 'region3', 'region4', 'region5', 'region6']
df_test = pd.DataFrame({
'region': list_reg,
'product1': [100, 250, 350, 555, 999999, 200000],
'product2': [41, 111, 12.14, 16.18, np.nan, 200003],
'product3': [7.04, 2.09, 11.14, 2000320, 22.17, np.nan],
'product4': [236, 249, 400, 0.56, 359, 122],
'product5': [None, 1.33, 2.54, 1, 0.9, 3.2]})
# in case of a re-run, make sure the randomly generated number doesn't change
random.seed(123)
random_genator = random.uniform(0.0001, 1.0000)
The desired goal would be something like below
To add the column to an existing DF, you can generate a list of the correct size using a comprehension:
df_test['random_genator'] = [random.uniform(0.0001, 1.0000) for _ in range(len(list_reg))]
which gives (for example):
region product1 product2 product3 product4 product5 random_genator
0 region1 100 41.00 7.04 236.00 NaN 0.052458
1 region2 250 111.00 2.09 249.00 1.33 0.087278
2 region3 350 12.14 11.14 400.00 2.54 0.407301
3 region4 555 16.18 2000320.00 0.56 1.00 0.107789
4 region5 999999 NaN 22.17 359.00 0.90 0.901209
5 region6 200000 200003.00 NaN 122.00 3.20 0.038250