Search code examples
python-3.xpandasdata-masking

masking string and phone number for dataframe in python pandas


Here I am trying to mask a data frame/dataset which have columns both integers and String values like this:

sno,Name,Type 1,Type 2,phonenumber
1,Bulbasaur,Grass,Poison,9876543212
2,Ivysaur,Grass,Poison,9876543212
3,Venusaur,Grass,Poison,9876543212

This is the code I am using,below code is working fine for string values it is masking well but for integers it is not masking:

import pandas as pd

filename = "path/to/file"
columnname= "phonenumber"
valuetomask = "9876543212"

column_dataset1 = pd.read_csv(filename)

print(column_dataset1)


# if(choice == "True"):
#masking for particular string/number in a column
column_dataset1[columnname]=column_dataset1[columnname].mask(column_dataset1[columnname] == valuetomask,"XXXXXXXXXX")
print(column_dataset1)
# masking last four digits
column_dataset1[columnname]=column_dataset1[columnname].str[:-4]+"****"
print(column_dataset1)

The above code is perfectly working for strings but when I gave "phonenumber"(any integer value) column it is not working.

Note: I need to do full masking(whole value should be masked) and partial masking(i.e last three digits/characters or first three digits/characters from above file) for any file which is given.


Solution

  • Convert your phone numbers to string and then try masking:

    mask_len = 5 # length of digits to mask from right side
    column_dataset1['phonenumber'] = (
        column_dataset1['phonenumber'].astype(str) # convert to string
            .str[:-mask_len]+"*" * mask_len # masking digits
    )