Search code examples
pythonpandasseries

How to use `pd.Series.str.contains()` when whitespace present in column name?


I want to find the first occurrence of a substring in a column.

For instance if I have the following dataframe..

# Create example dataframe
import pandas as pd
data = {
        'Tunnel ID':['Tom', 'Dick', 'Harry'],
        'State':['Grumbly', 'Very Happy', "Happy"],
        'Length':[302, 285, 297]
        }
df = pd.DataFrame(data)

.. I can find the first occurrence of 'Happy' in the 'State' column using:

# Returns index 1
first_match = df.State.str.contains('Happy').idxmax()

However if I want to find the first match of 'ic' in 'Tunnel ID':

# Returns syntax error because of space in col name.
first_match = df.Tunnel ID.str.contains('ic').idxmax()
# Would ideally return index: 1; containing ID: 'Dick'.

So what does one do trying to use pd.Series.str.contains() and the pd.Series contains whitespace?


Solution

  • You access your columny also by indexing into your data frame and not using the dot notation. So just do

    first_match = df["Tunnel ID"].str.contains('ic').idxmax()
    

    and you should be good to go