Here I am trying to mask a data frame/dataset which have columns both integers and String values like this:
sno,Name,Type 1,Type 2,phonenumber
1,Bulbasaur,Grass,Poison,9876543212
2,Ivysaur,Grass,Poison,9876543212
3,Venusaur,Grass,Poison,9876543212
This is the code I am using,below code is working fine for string values it is masking well but for integers it is not masking:
import pandas as pd
filename = "path/to/file"
columnname= "phonenumber"
valuetomask = "9876543212"
column_dataset1 = pd.read_csv(filename)
print(column_dataset1)
# if(choice == "True"):
#masking for particular string/number in a column
column_dataset1[columnname]=column_dataset1[columnname].mask(column_dataset1[columnname] == valuetomask,"XXXXXXXXXX")
print(column_dataset1)
# masking last four digits
column_dataset1[columnname]=column_dataset1[columnname].str[:-4]+"****"
print(column_dataset1)
The above code is perfectly working for strings but when I gave "phonenumber"(any integer value) column it is not working.
Note: I need to do full masking(whole value should be masked) and partial masking(i.e last three digits/characters or first three digits/characters from above file) for any file which is given.
Convert your phone numbers to string and then try masking:
mask_len = 5 # length of digits to mask from right side
column_dataset1['phonenumber'] = (
column_dataset1['phonenumber'].astype(str) # convert to string
.str[:-mask_len]+"*" * mask_len # masking digits
)