thank you for any help in solving this. I'm new to python still, but know VBA for excel.
When extracting start and stop positions to new columns in dataframe on regex conditions, integers are returned and found without a problem. but when trying to slice the text with that code, the following error occurs
TypeError: slice indices must be integers or None or have an index method
I have the following dataframe row of data and lines of code that I have tried after many different variation attempts over the last few days. I feel this is the closest I have been abe to solve it.
|MENU_HINT |StartPos |EndPos
|AUS / Maitland (AUS) 28th Feb |4 |22
df2['StartPos'] = df2["MENU_HINT"].str.find('/')
df2['EndPos'] = df2["MENU_HINT"].apply(lambda x: re.search('d+', x).start() if re.search('\d+', x) else re.search('\d+', x)) + 1
df2['Track'] = df2['MENU_HINT'].apply(lambda x: x[x.find('/ ') + 1:df2['EndPos']])
I am expecting to extract the location name starting after the '/' and exclude the date into a new column from the MENU_HINT column.
I've even tried the below code which i found also provides the end position as an integer. But when either trying to use the column or code itself for the slice, it still provides the same error
df2['Track3'] = df2["MENU_HINT"].apply(lambda x: re.search(
'\d+', x).start() if re.search('\d+', x) else re.search('\d+', x)) + 1
I am trying to find the first numeric digit as this field will only have a date that i want truncated.
I'm still in learning mode but can understand concepts well, so I would like to understand why this is occurring as it seems to be a useful knowledge base for me to have as i will come across this type of extraction quite often.
Thank you for your time and efforts in helping me solve this.
Sincerely, Paul
If you wish to make the function longer than one line (as lambda functions can restrict in some cases), which would enable a clearer and commented manipulation of the string, then you would do this:
import pandas as pd
# sample data for dataframe
d = {'menu_hint':['AUS / Maitland (AUS) 28th Feb']}
df = pd.DataFrame(d)
print('the old dataframe:')
print(df)
def strip_word(s:str):
''' function to strip parts of word '''
# n is the start position
n = s.find('/')
n = n+2 # adjust for spaces
# m is the end position
m = s.find(') ')
m = m+1 # adjust for zero index
s_new = s[n:m]
return s_new
df['Track3'] = df['menu_hint'].apply(strip_word)
print('the new dataframe:')
print(df)
The output:
menu_hint
0 AUS / Maitland (AUS) 28th Feb
the new dataframe:
menu_hint Track3
0 AUS / Maitland (AUS) 28th Feb Maitland (AUS)