Search code examples
pythonpandasrandomrandom-seed

How to create a column with randomly generated values in a pandas dataframe


I want to assign a random float (from 0 to 1) to a column that contains unique value within a Pandas dataframe.

Below is a dataframe with unique value of "region"; I want to create a new column with a unique randomly generated float (between 0 to 1) corresponds to each region.

I used random function to generate a random number, but I couldn't figure out how to assign these random numbers to each region and make it a new column.

The goal also includes making sure the random number assigned to each region doesn't change in case of a re-run, so I set a seed.

import pandas as pd
import numpy as np
import random

list_reg = ['region1', 'region2', 'region3', 'region4', 'region5', 'region6']

df_test = pd.DataFrame({
    'region': list_reg,
    'product1': [100, 250, 350, 555, 999999, 200000],
    'product2': [41, 111, 12.14, 16.18, np.nan, 200003],
    'product3': [7.04, 2.09, 11.14, 2000320, 22.17, np.nan],
    'product4': [236, 249, 400, 0.56, 359, 122],
    'product5': [None, 1.33, 2.54, 1, 0.9, 3.2]})

# in case of a re-run, make sure the randomly generated number doesn't change
random.seed(123)
random_genator = random.uniform(0.0001, 1.0000)

The desired goal would be something like below

enter image description here


Solution

  • To add the column to an existing DF, you can generate a list of the correct size using a comprehension:

    df_test['random_genator'] = [random.uniform(0.0001, 1.0000) for _ in range(len(list_reg))]
    

    which gives (for example):

        region  product1   product2    product3  product4  product5  random_genator
    0  region1       100      41.00        7.04    236.00       NaN        0.052458
    1  region2       250     111.00        2.09    249.00      1.33        0.087278
    2  region3       350      12.14       11.14    400.00      2.54        0.407301
    3  region4       555      16.18  2000320.00      0.56      1.00        0.107789
    4  region5    999999        NaN       22.17    359.00      0.90        0.901209
    5  region6    200000  200003.00         NaN    122.00      3.20        0.038250