Search code examples
python-3.xpandasfunctionapply

Determine the length of string and extract the first 15 or 14 characters


Here is my data set:

import pandas as pd
data = {'Name': ['Tom', 'Nick','Jack', 'Ann', 'Jane'],
        'group1': ['SRE_high_0101240243', 'ERS_med_140124065', 'SRE_low_110124084' , 'SRE_high_05022484', 'CER_med_11022437023']}
  
df = pd.DataFrame(data)
df

enter image description here

I want to extract the first 15 or 14 characters from the 'group1' column, depending on the length of the string in 'Group 1'. If the length of the string in column group 1 is 19 then extract the first 15 characters, and if it's not 19 then extract the first 14 characters.

Here's my failed attempt:

def clean_substr(df):
    if df['group1'].str.len() == 19:
        val = df['group1'].str.slice(0,15)
    elif df['group1'].str.len() != 19:
        val = df['group1'].str.slice(0,14)  
    else:
        val = "issue"
        return val

df['group1_clean'] = df.apply(clean_substr)
display(df)

I'm getting an error when I run this code and not sure whats making it fail. Any help will be greatly appreciated. Thanks


Solution

  • df.apply(clean_substr) applies the function to every column; so in the clean_substr code, your df is actually a series, whose index is the same as df's. There df['group1'] would throw a KeyError exception.

    You can do a conditional select:

    df['group1_clean'] = np.where(df['group1'].str.len() == 19, 
                                  df['group1'].str[:15], 
                                  df['group1'].str[:14])