Search code examples
pythonpandaslist-comprehensionpipeline

Using list comprehension while creating a pandas pipeline throws function object not iterable error


I have to create a Pandas dataframe from a csv file using a pipeline. The src csv file may contain any number of columns with header/name containing the string 'SLA'. Sample data below: enter image description here

While creating the pandas pipeline I have to extract and store only the string before the first delimeter ('|') for all the SLA columns. For example for ID=1 the SLA1 in csv contains the value '24h|0h|13h' and I will have to store only the 24h in the dataframe (similarly for other SLA columns)

My code is as follows:

import pandas as pd


def get_sla_cols(df):
    return [col for col in df.columns if 'SLA' in col]


def split(df, cols, split_str):
    for col in cols:
        df[col] = df[col].str.split(split_str, expand=True, n=1)[0]
    return df


csv_path = r"C:\Users\daryl\Downloads\svc.csv"
svc_df = (pd.read_csv(csv_path)
          .pipe(split, lambda x: x.pipe(get_sla_cols), '|'))
 

I'm getting the below error: enter image description here

But if I run:

print(pd.read_csv(csv_path).pipe(lambda x: x.pipe(get_sla_cols)))

I'm getting the below output as expected:

enter image description here

As the code lambda x: x.pipe(get_sla_cols) is generating the list of column names why the function split(df, cols, split_str) throws error that it cannot iterate over the list of columns in the for loop? (refer to the error screenshot).

Note: If I replace lambda x: x.pipe(get_sla_cols) with hardcoded list say ['SLA1', 'SLA2', 'SLA3', 'SLA4', 'SLA5'] the code (split() function) throws no error and working as expected.


Solution

  • this should work then :

    svc_df = (pd.read_csv(csv_path)
              .pipe(lambda df: split(df, get_sla_cols(df), '|')))
    

    Using a lambda function for the whole pipe.