I want to find the first occurrence of a substring in a column.
For instance if I have the following dataframe..
# Create example dataframe
import pandas as pd
data = {
'Tunnel ID':['Tom', 'Dick', 'Harry'],
'State':['Grumbly', 'Very Happy', "Happy"],
'Length':[302, 285, 297]
}
df = pd.DataFrame(data)
.. I can find the first occurrence of 'Happy' in the 'State' column using:
# Returns index 1
first_match = df.State.str.contains('Happy').idxmax()
However if I want to find the first match of 'ic' in 'Tunnel ID':
# Returns syntax error because of space in col name.
first_match = df.Tunnel ID.str.contains('ic').idxmax()
# Would ideally return index: 1; containing ID: 'Dick'.
So what does one do trying to use pd.Series.str.contains()
and the pd.Series
contains whitespace?
You access your columny also by indexing into your data frame and not using the dot notation. So just do
first_match = df["Tunnel ID"].str.contains('ic').idxmax()
and you should be good to go