I have the following dataframe:
df=pd.DataFrame({'ssn':[12345,54321,111,47895,222311],'Name':['john','mike','adam','doug','liz']})
The DataFrame contains a 'ssn' that is supposed to only contain 5 digits. I want to replace all the rows that contain less than or greater than 5 digits with blank spaces.
The desired output is as below:
Name ssn
0 john 12345
1 mike 54321
2 adam
3 doug 47895
4 liz
I referred to the following post from SO replace string if length is less than x However, on using the same solution with following commands gives me an error:
mask = df['ssn'].str.len() == 5
df['ssn'] = df['ssn'].mask(mask, df['ssn'].str.replace(df['ssn'], ''))
Traceback (most recent call last):
TypeError: 'Series' objects are mutable, thus they cannot be hashed
I would appreciate any suggestions.
Your column ssn contains numbers not string, that is why it is not working. Try the following :
mask = df['ssn'].astype(str).str.len() != 5
df.loc[mask, 'ssn'] = ''
In [1] : print(df)
Out[1] : Name ssn
0 john 12345
1 mike 54321
2 adam
3 doug 47895
4 liz