Search code examples
pythonloopsrandom

Random combination of letters and numbers in each cell of one column


I haven't been able to find the solution to the loop part of the following. I have a data frame with over 500K of rows. I want to write a random combination of letters and numbers in a column we'll call "ProductID". I found solutions here that let me write simple numbers, which work, even if they're painfully slow. For example:

for index, row in df3.iterrows():
    df3['ProductID'] = np.arange(1,551586)

I have also found the code on this site to produce a random sequence, and each time I run it, it dutifully produces a new string:

import string
import random

def id_generator(size=12, chars=string.ascii_uppercase + string.digits):
    return ''.join(random.choice(chars) for _ in range(size))

# df3['ProductID'] = id_generator()

i = 0

while i < 6:
    print(id_generator())
    i = i + 1

Output:

7JKD7LWUZPHC
1ETULSX4WRJI
B42TSN4SFC20
RYIDD7N2RPI2
8GEMULEC7TX1
0FGZZQLBF0XE

What I can't seem to do is write that string to each cell in a new column as described above.

My apologies, I cannot find where I found it exactly. However, when I try to enclose it in a loop, like so, it takes the first string generated and simply duplicates it:

for index, row in df3.iterrows():
    df3['ProductID'] = id_generator()

The same thing happens if I use a simple while loop.

Current output:

+---------------------------------------------------+---------------+------------------+---------+---------------+--------------------+------------------+--------------+
|                       name                        | main_category |   sub_category   | ratings | no_of_ratings | discount_price_USD | actual_price_USD |  ProductID   |
+---------------------------------------------------+---------------+------------------+---------+---------------+--------------------+------------------+--------------+
| Lloyd 1.5 Ton 3 Star Inverter Split Ac (5 In 1... | appliances    | Air Conditioners |     4.2 |          2255 |           402.5878 |          719.678 | HP2ISWKAI7CA |
| LG 1.5 Ton 5 Star AI DUAL Inverter Split AC (C... | appliances    | Air Conditioners |     4.2 |          2948 |            567.178 |          927.078 | HP2ISWKAI7CA |
| LG 1 Ton 4 Star Ai Dual Inverter Split Ac (Cop... | appliances    | Air Conditioners |     4.2 |          1206 |            420.778 |          756.278 | HP2ISWKAI7CA |
| LG 1.5 Ton 3 Star AI DUAL Inverter Split AC (C... | appliances    | Air Conditioners |       4 |            69 |            463.478 |          841.678 | HP2ISWKAI7CA |
| Carrier 1.5 Ton 3 Star Inverter Split AC (Copp... | appliances    | Air Conditioners |     4.1 |           630 |            420.778 |          827.038 | HP2ISWKAI7CA |
+---------------------------------------------------+---------------+------------------+---------+---------------+--------------------+------------------+--------------+

Expected output:

+---------------------------------------------------+---------------+------------------+---------+---------------+--------------------+------------------+--------------+
|                       name                        | main_category |   sub_category   | ratings | no_of_ratings | discount_price_USD | actual_price_USD |  ProductID   |
+---------------------------------------------------+---------------+------------------+---------+---------------+--------------------+------------------+--------------+
| Lloyd 1.5 Ton 3 Star Inverter Split Ac (5 In 1... | appliances    | Air Conditioners |     4.2 |          2255 |           402.5878 |          719.678 | HP2ISWKAI7CA |
| LG 1.5 Ton 5 Star AI DUAL Inverter Split AC (C... | appliances    | Air Conditioners |     4.2 |          2948 |            567.178 |          927.078 | 7JKD7LWUZPHC |
| LG 1 Ton 4 Star Ai Dual Inverter Split Ac (Cop... | appliances    | Air Conditioners |     4.2 |          1206 |            420.778 |          756.278 | 1ETULSX4WRJI |
| LG 1.5 Ton 3 Star AI DUAL Inverter Split AC (C... | appliances    | Air Conditioners |       4 |            69 |            463.478 |          841.678 | B42TSN4SFC20 |
| Carrier 1.5 Ton 3 Star Inverter Split AC (Copp... | appliances    | Air Conditioners |     4.1 |           630 |            420.778 |          827.038 | RYIDD7N2RPI2 |
+---------------------------------------------------+---------------+------------------+---------+---------------+--------------------+------------------+--------------+

I'm clearly doing something wrong, but I can't figure out what.


Solution

  • The reason why you are getting the same value in the Product ID column in this code:

    for index, row in df3.iterrows():
        df3['ProductID'] = id_generator()
    

    is because it is applying the value of id_generator() to the entire column and not to each cell. So what you are left with is whatever the last value was for id_generator() when the for loop finished.

    One possible solution to this problem is instantiating the Product ID column first with NaN values. Modifying your def id_generator() function so that apply() can be used on it. Here is what that would look like:

    # Just added cell_val as part of the arguments
    def id_generator(cell_val , size=12, chars=string.ascii_uppercase + string.digits):
        cell_val = ''.join(random.choice(chars) for _ in range(size))
        return cell_val 
    
    # instantiate product id col with nan
    df3['ProductID'] = np.nan
    
    # apply your function to product id col
    df3['ProductID'] = df3['ProductID'].apply(id_generator)