I have some customer data with me -
Name | Age | Gender | Phone Number | Email Id |
abc. | 25 | M. | 234 567 890 | example.com|
There are 60k rows of data like this and multiple tables. How can I make synthetic data for this dataset using python ?
I have no knowledge about this. Any suggestions would be helpful. Thanks!
Pyhton faker
is your friend here. It can generate locaclized fake data for names, addresses, phone and credit card numbers and many more.
from faker import Faker
fake = Faker()
n = 1000
df = pd.DataFrame([[fake.name(),
np.random.randint(19,91),
np.random.choice(['M.', 'F.']),
fake.phone_number(),
fake.email()] for _ in range(n)],
columns=['Name', 'Age', 'Gender', 'Phone number', 'Email ID'])
Output of df.head():
Name Age Gender Phone number Email ID
0 Miranda Hinton 21 F. 018.482.1404 meghan91@lopez.biz
1 Donald Donovan 51 F. 572.846.4120x995 jacobcarson@melton.com
2 Shannon Grimes 72 F. 0289879995 phillip93@gmail.com
3 Heather Perez 87 F. 012-033-2318 rodriguezjeffrey@hotmail.com
4 Jacqueline Pearson 22 M. 178-913-4566x89793 brianclark@hotmail.com