Here is my data set:
import pandas as pd
data = {'Name': ['Tom', 'Nick','Jack', 'Ann', 'Jane'],
'group1': ['SRE_high_0101240243', 'ERS_med_140124065', 'SRE_low_110124084' , 'SRE_high_05022484', 'CER_med_11022437023']}
df = pd.DataFrame(data)
df
I want to extract the first 15 or 14 characters from the 'group1' column, depending on the length of the string in 'Group 1'. If the length of the string in column group 1 is 19 then extract the first 15 characters, and if it's not 19 then extract the first 14 characters.
Here's my failed attempt:
def clean_substr(df):
if df['group1'].str.len() == 19:
val = df['group1'].str.slice(0,15)
elif df['group1'].str.len() != 19:
val = df['group1'].str.slice(0,14)
else:
val = "issue"
return val
df['group1_clean'] = df.apply(clean_substr)
display(df)
I'm getting an error when I run this code and not sure whats making it fail. Any help will be greatly appreciated. Thanks
df.apply(clean_substr)
applies the function to every column; so in the clean_substr
code, your df
is actually a series, whose index is the same as df
's. There df['group1']
would throw a KeyError
exception.
You can do a conditional select:
df['group1_clean'] = np.where(df['group1'].str.len() == 19,
df['group1'].str[:15],
df['group1'].str[:14])