Search code examples
pythonsynthetic

How to create synthetic customer data in python


I have some customer data with me -

Name |  Age |  Gender |  Phone Number | Email Id |

abc. |  25  |  M.     | 234 567 890   | example.com|

There are 60k rows of data like this and multiple tables. How can I make synthetic data for this dataset using python ?

I have no knowledge about this. Any suggestions would be helpful. Thanks!


Solution

  • Pyhton faker is your friend here. It can generate locaclized fake data for names, addresses, phone and credit card numbers and many more.

    from faker import Faker
    fake = Faker()
    n = 1000
    df = pd.DataFrame([[fake.name(),
            np.random.randint(19,91),
            np.random.choice(['M.', 'F.']),
            fake.phone_number(),
            fake.email()] for _ in range(n)],
            columns=['Name', 'Age', 'Gender', 'Phone number', 'Email ID'])
    

    Output of df.head():

                     Name  Age Gender        Phone number                      Email ID
    0      Miranda Hinton   21     F.        018.482.1404            [email protected]
    1      Donald Donovan   51     F.    572.846.4120x995        [email protected]
    2      Shannon Grimes   72     F.          0289879995           [email protected]
    3       Heather Perez   87     F.        012-033-2318  [email protected]
    4  Jacqueline Pearson   22     M.  178-913-4566x89793        [email protected]