Search code examples
pythonregexpandasdataframeignore-case

Pandas Python: take a subset of df by row labels while using re.IGNORECASE


I have df which looks like this:

print df_raw

Name              exp1
Name                  
UnweightedBase    1364
Base              1349
BFC_q5a1        34.18%
BFC_q5a2         2.93%
BFC_q5a3         1.86%
BFC_q5a4         1.93%
BFC_q5a5         0.84%

I want to build subset from the dataframe above using row labels however, I was like to use re.IGNORECASE but I'm not sure how.

without re.IGNORECASE the code looks like this:

subset_df = df_raw.loc[df_raw.index.isin(['BFC_q5a4', 'BFC_q5a5'])]

How can I change my code to make use of re.IGNORECASE for the code below:

subset_df = df_raw.loc[df_raw.index.isin(['bFc_q5A4', 'BfC_Q5a5'])]

note - I don't want to use str.lower or str.upper to do this.

Thanks!


Solution

  • I don't know of any neat way to search index labels in a case-insensitive way (df.filter is useful but doesn't appear to be able to ignore case unfortunately).

    To get around this, you could make use of the series method pd.Series.str.contains which can ignore case:

    subset_df = df[pd.Series(df.index).str.contains(regex, case=False).values]
    

    The index is turned in a Series and then regex matching is applied. regex in this case could be something like 'bFc_q5A4|BfC_Q5a5'. Case is ignored (using case=False).