I have created the following dataframe (called df
):
d = {'ltv': [1, 22,45,78], 'age': [33, 43,54,65],'job': ['Salaried','Salaried','Salaried','Owner'], 'UniqueID' : ['A1','A2','A3','A4'] }
df = pd.DataFrame(data=d)
which looks like this:
print(df)
ltv age job UniqueID
1 33 Salaried A1
22 43 Salaried A2
45 54 Salaried A3
78 65 Owner A4
I have checked its columns types:
print(df.info())
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ltv 4 non-null int64
1 age 4 non-null int64
2 job 4 non-null object
3 UniqueID 4 non-null object
I only focus on the two object columns which are job
and UniqueID
.
As you can see:
job
contains only stringsUniqueID
contains both strings and numbersI want to be able to identify the column (in this case UniqueID
) that contains both strings and numbers.
If I use the following code for UniqueID
:
print(df['UniqueID'].str.isalnum())
0 True
1 True
2 True
3 True
I see that it returns True
for all records, which is great. Now, if I use the same code for job
, I get the same results:
print(df['job'].str.isalnum())
0 True
1 True
2 True
3 True
So, how can I identify in pandas which column that contains both strings and numbers (in this example: UniqueID
)?
You can def your own function
def findchrandnum(x):
try :
return all(x.str.isalnum() & ~x.str.isalpha() & ~x.str.isdigit())
except:
return False
df.apply(findchrandnum)
Out[66]:
ltv False
age False
job False
UniqueID True
dtype: bool