Search code examples
pythonpython-3.xpandasdataframepython-re

Only keep df column values that contain a string from list of string


I Have a list of strings like this:

stringlist = [JAN, jan, FEB, feb, mar]

And I have a dataframe that looks like this:

**date**            **value**
01MAR16                1
05FEB16                12
10jan17                5
10mar15                9
03jan05                7
04APR12                3

I only want to keep the dates which contain one string from stringlist in it, the result should look like this:

**date**            **value**
NA                     1
05FEB16                12
10jan17                5
10mar15                9
03jan05                7
NA                     3

Im new to using regular expression so having some trouble wrapping my head around it, would appreciate some help.


Solution

  • stringlist = ["JAN", "jan", "FEB", "feb", "mar"]
    
    m = df["date"].str.contains("|".join(stringlist))
    df.loc[~m, "date"] = np.nan
    print(df)
    

    Prints:

          date  value
    0      NaN      1
    1  05FEB16     12
    2  10jan17      5
    3  10mar15      9
    4  03jan05      7
    5      NaN      3