Search code examples
pythonpandasdataframesubstring

python pandas substring based on columns values


Given the following df:

data = {'Description':  ['with lemon', 'lemon', 'and orange', 'orange'],
        'Start': ['6', '1', '5', '1'],
       'Length': ['5', '5', '6', '6']}
df = pd.DataFrame(data)
print (df)

I would like to substring the "Description" based on what is specified in the other columns as start and length, here the expected output:

data = {'Description':  ['with lemon', 'lemon', 'and orange', 'orange'],
        'Start': ['6', '1', '5', '1'],
       'Length': ['5', '5', '6', '6'],
       'Res':  ['lemon', 'lemon', 'orange', 'orange']}
df = pd.DataFrame(data)
print (df)

Is there a way to make it dynamic or another compact way?

df['Res'] = df['Description'].str[1:2]

Solution

  • You need to loop, a list comprehension will be the most efficient (python ≥3.8 due to the walrus operator, thanks @I'mahdi):

    df['Res'] = [s[(start:=int(a)-1):start+int(b)] for (s,a,b)
                 in zip(df['Description'], df['Start'], df['Length'])]
    

    Or using pandas for the conversion (thanks @DaniMesejo):

    df['Res'] = [s[a:a+b] for (s,a,b) in 
                 zip(df['Description'],
                     df['Start'].astype(int)-1,
                     df['Length'].astype(int))]
    

    output:

      Description Start Length     Res
    0  with lemon     6      5   lemon
    1       lemon     1      5   lemon
    2  and orange     5      6  orange
    3      orange     1      6  orange
    

    handling non-integers / NAs

    df['Res'] = [s[a:a+b] if pd.notna(a) and pd.notna(b) else 'NA'
                 for (s,a,b) in 
                 zip(df['Description'],
                     pd.to_numeric(df['Start'], errors='coerce').convert_dtypes()-1,
                     pd.to_numeric(df['Length'], errors='coerce').convert_dtypes()
                    )]
    

    output:

        Description Start Length     Res
    0    with lemon     6      5   lemon
    1         lemon     1      5   lemon
    2    and orange     5      6  orange
    3        orange     1      6  orange
    4  pinapple xxx    NA     NA      NA
    5      orangiie    NA     NA      NA