Search code examples
pythonpandasdataframesubstring

Find substring in pandas dataframe and save in new column


I have a dataframe with approx. 10,000 rows and 10 columns. And I have a string, which I want to search for in the dataframe, called 'atmosphere'. This string can only be found once in a row. I want to keep only the cells that contain this string, but with their whole content, and save them in a new column. I already found the following solution, but it only gives me back "True" (when cell contains string) or "False" (when it does not).:

df.apply(lambda col: col.str.contains('atmosphere', case=False), axis=1)
Output:
  col_1  col_2  col_3  col_4 ...
1 True   False  False  False
2 False  True   False  False
3 True   False  False  False 
...

How can I get from this, to this?:

   new_col
1 today**atmosphere**is
2 **atmosphere**humid
3 the**atmosphere**now

Solution

  • If you already have your result, you can simply stack it:

    df = pd.DataFrame({"a":["apple", "orange", "today atmosphere"],
                       "b":["pineapple", "atmosphere humid", "kiwi"],
                       "c":["the atmosphere now", "watermelon", "grapes"]})
    
                      a                 b                   c
    0             apple         pineapple  the atmosphere now
    1            orange  atmosphere humid          watermelon
    2  today atmosphere              kiwi              grapes
    
    
    print (df[df.apply(lambda col: col.str.contains('atmosphere', case=False), axis=1)].stack())
    
    0  c    the atmosphere now
    1  b      atmosphere humid
    2  a      today atmosphere
    dtype: object