Search code examples
pythonregexpandascontains

Using a variable within a regular expression in Pandas str.contains()


I'm attempting to select rows from a dataframe using the pandas str.contains() function with a regular expression that contains a variable as shown below.

df = pd.DataFrame(["A test Case","Another Testing Case"], columns=list("A"))
variable = "test"
df[df["A"].str.contains(r'\b' + variable + '\b', regex=True, case=False)] #Returns nothing

While the above returns nothing, the following returns the appropriate row as expected

df[df["A"].str.contains(r'\btest\b', regex=True, case=False)] #Returns values as expected

Any help would be appreciated.


Solution

  • Both word boundary characters must be inside raw strings. Why not use some sort of string formatting instead? String concatenation as a rule is generally discouraged.

    df[df["A"].str.contains(fr'\b{variable}\b', regex=True, case=False)] 
    # Or, 
    # df[df["A"].str.contains(r'\b{}\b'.format(variable), regex=True, case=False)] 
    
                 A
    0  A test Case