Search code examples
pythonpandascharacter

How to select rows where number of characters different from 3 or contains at least 1 letter or no data (NaN) in Python Pandas?


I have DataFrame in Python Pandas like below (data type of col1 is string):

col1
-------
1112224
222
111
123DDT
985T
NaN

And I need to select only these rows where:

  • number of characters different from 3

or

  • contains at least 1 letter

or

  • no data (NaN)

So as a result I need something like below:

col1
-------
123DDT
985T
NaN

How can I do that in Python Pandas ?


Solution

  • You can easily use masks and boolean indexing:

    # not 3 chars
    m1 = df['col1'].str.len().ne(3)
    # at least one letter
    m2 = df['col1'].str.contains('[a-zA-Z]')
    # or NaN
    m3 = df['col1'].isna()
    
    # any of the above conditions
    out = df[m1|m2|m3]
    

    NB. the condition 3 is already included in the first one (NaN has not 3 chars*), so you could skip it.

    output:

          col1
    0  1112224
    3   123DDT
    4     985T
    5      NaN
    

    * although, ironically, the name "NaN" does ;)