I have some customer data with me -
Name | Age | Gender | Phone Number | Email Id |
abc. | 25 | M. | 234 567 890 | example.com|
There are 60k rows of data like this and multiple tables. How can I make synthetic data for this dataset using python ?
I have no knowledge about this. Any suggestions would be helpful. Thanks!
Pyhton faker
is your friend here. It can generate locaclized fake data for names, addresses, phone and credit card numbers and many more.
from faker import Faker
fake = Faker()
n = 1000
df = pd.DataFrame([[fake.name(),
np.random.randint(19,91),
np.random.choice(['M.', 'F.']),
fake.phone_number(),
fake.email()] for _ in range(n)],
columns=['Name', 'Age', 'Gender', 'Phone number', 'Email ID'])
Output of df.head():
Name Age Gender Phone number Email ID
0 Miranda Hinton 21 F. 018.482.1404 [email protected]
1 Donald Donovan 51 F. 572.846.4120x995 [email protected]
2 Shannon Grimes 72 F. 0289879995 [email protected]
3 Heather Perez 87 F. 012-033-2318 [email protected]
4 Jacqueline Pearson 22 M. 178-913-4566x89793 [email protected]