Search code examples
pythonregexnullpython-re

How to ignore nulls while doing matching in python?


I have a data frame, I am using regex to check the pattern of the data of a column while doing this there are nulls in it. Due to nulls, it was able to match. I don't want to drop them either replace it with some other value. I want to ignore it, though I tried getting errors or getting NONE as output. How do we ignore the null values while doing a match?

code:

df =
  a        b    c
0 rt-0000  abc  1
1          vb   2
2 rt-1234  abc  3
3          op   4
4 rt-123   oip  5

format = 'rt-\d\d\d\d'
if df['a'].isnull().any():
          continue
          correct_df = df[df[key].str.match(format )]
          wrong_df = df[~df[key].str.match(format )]

The output gives: NONE

when I tried without ignoring nulls I got a error: 'Cannot mask Naan/Null values'

excepted output:

corrected_df:
      a        b    c
    0 rt-0000  abc  1
    1          vb   2
    2 rt-1234  abc  3
    3          op   4
wrong_df:
4 rt-123   oip  5

I tried using different if condition but I end up with the same output. Can we ignore the null values?


Solution

  • For:

    df = pd.DataFrame({'a':['rt-0000',np.nan,'rt-1234',np.nan,'rt-123'],
                      'b':['abc','vb','abc','op','oip'],
                      'c':[1,2,3,4,5]})
    
             a    b  c
    0  rt-0000  abc  1
    1      NaN   vb  2
    2  rt-1234  abc  3
    3      NaN   op  4
    4   rt-123  oip  5
    

    You can simply use:

    correct_df = df[df.a.str.match(format, na=True)]
    wrong_df = df[~df.a.str.match(format, na=True)]
    

    That gives your result:

             a    b  c
    0  rt-0000  abc  1
    1      NaN   vb  2
    2  rt-1234  abc  3
    3      NaN   op  4
    

    and

            a    b  c
    4  rt-123  oip  5