Search code examples
pandaslambda

Extract 1st Column data and update 2ndColumn based on 1st Column data


I have an excel file with the following data:

LogID
T-1111
P-09899
P-09189,T-0011
T-111,T-2111
P-09099,P-7897
RCT-0989,RCT-099

I need to extract the first column LogID before the delimiter "-" and then populate a second column 'LogType' based on the string extracted (T is Tank LogType, P is Pump LogType)

For the above input, the output should be

LogID LogType
T-1111 Tank
P-09899 Pump
P-09189,T-0011 Multiple
T-111,T-2111 Tank
P-09099,P-7897 Pump
RCT-0989,RCT-099 Reactor

I have written a function to do this in python:

def log_parser(log_string):
    log_dict =  { "T":"Tank","P":"Pump" }
    log_list = log_string.split(",")
    for i in log_list:
        str_extract = i.upper().split("-",1)
        if len(log_list) ==1:
            result = log_dict[str_extract[0]]
            return result
            break
        else:
            idx = log_list.index(i)
            for j in range(len(log_list)):
                if (idx == j):
                    continue
                str_extract_j = log_list[j].upper().split("-",1)
                if str_extract_j[0] != str_extract[0]:
                    result = "Multiple"
                    return result
                    break
                else:
                    result = log_dict[str_extract[0]]
    return result

I am not sure how to implement this function in pandas..

Can i define the function in pandas and then use the lamba apply funtion like this:

test_df['LogType'] = test_df[['LogID']].apply(lambda x:log_parser(x), axis=1)

Solution

  • You can use:

    # mapping dictionary for types
    d = {'T': 'Tank', 'P': 'Pump'}
    
    # extract letters before -
    s = df['LogID'].str.extractall('([A-Z])-')[0]
    # group by index 
    g = s.groupby(level=0)
    
    df['LogType'] = (g.first() # get first match
                     .map(d)   # map type name
                     # mask if several types 
                     .mask(g.nunique().gt(1),
                           'Multiple')
                     )
    

    Output:

                LogID   LogType
    0          T-1111      Tank
    1         P-09899      Pump
    2  P-09189,T-0011  Multiple