I have a df with a column that looks like this:
id
11
22
22
333
33
333
This column is sensitive data. I want to replace each value with any random number but each random number should be maintain the same number across the same IDs.
For example, I want to make mask the data in the column like so:
id
123
987
987
456
00
456
Note the same IDs have the same value. How do I achieve this? I have thousands of IDs.
I would suggest something like this (But it will not work properly - it will creates values randomly so new values can repeat themselves for different unique initial values):
from random import randint
df['id_rand'] = df.groupby('id')['id'].transform(lambda x: randint(1,1000))
>>> df
'''
id id_rand
0 11 833
1 22 577
2 22 577
3 333 101
4 33 723
5 333 101