Search code examples
pythonpython-3.xpandasanonymitycategorical

How to replace values in Pandas column with random numbers per unique values (random categorical)?


I have a df with a column that looks like this:

id   
11    
22
22
333
33
333

This column is sensitive data. I want to replace each value with any random number but each random number should be maintain the same number across the same IDs.

For example, I want to make mask the data in the column like so:

id   
123   
987
987
456
00
456

Note the same IDs have the same value. How do I achieve this? I have thousands of IDs.


Solution

  • I would suggest something like this (But it will not work properly - it will creates values randomly so new values can repeat themselves for different unique initial values):

    from random import randint
    
    df['id_rand'] = df.groupby('id')['id'].transform(lambda x: randint(1,1000))
    >>> df
    '''
        id  id_rand
    0   11      833
    1   22      577
    2   22      577
    3  333      101
    4   33      723
    5  333      101