Search code examples
pythonpandasstringnumericalphanumeric

Identify pandas dataframe columns containing both numeric and string


I have created the following dataframe (called df):

d = {'ltv': [1, 22,45,78], 'age': [33, 43,54,65],'job': ['Salaried','Salaried','Salaried','Owner'], 'UniqueID' : ['A1','A2','A3','A4'] }
df = pd.DataFrame(data=d)

which looks like this:

print(df)

   ltv  age       job UniqueID
     1   33  Salaried       A1
    22   43  Salaried       A2
    45   54  Salaried       A3
    78   65     Owner       A4

I have checked its columns types:

print(df.info())

 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   ltv       4 non-null      int64 
 1   age       4 non-null      int64 
 2   job       4 non-null      object
 3   UniqueID  4 non-null      object

I only focus on the two object columns which are job and UniqueID. As you can see:

  • job contains only strings
  • UniqueID contains both strings and numbers

I want to be able to identify the column (in this case UniqueID) that contains both strings and numbers.

If I use the following code for UniqueID:

print(df['UniqueID'].str.isalnum())

0    True
1    True
2    True
3    True

I see that it returns True for all records, which is great. Now, if I use the same code for job, I get the same results:

print(df['job'].str.isalnum())

    0    True
    1    True
    2    True
    3    True

So, how can I identify in pandas which column that contains both strings and numbers (in this example: UniqueID)?


Solution

  • You can def your own function

    def findchrandnum(x):
        try :
            return all(x.str.isalnum() & ~x.str.isalpha() & ~x.str.isdigit())
        except:
            return False
    df.apply(findchrandnum)
    Out[66]: 
    ltv         False
    age         False
    job         False
    UniqueID     True
    dtype: bool