I have an excel file with the following data:
LogID |
---|
T-1111 |
P-09899 |
P-09189,T-0011 |
T-111,T-2111 |
P-09099,P-7897 |
RCT-0989,RCT-099 |
I need to extract the first column LogID before the delimiter "-" and then populate a second column 'LogType' based on the string extracted (T is Tank LogType, P is Pump LogType)
For the above input, the output should be
LogID | LogType |
---|---|
T-1111 | Tank |
P-09899 | Pump |
P-09189,T-0011 | Multiple |
T-111,T-2111 | Tank |
P-09099,P-7897 | Pump |
RCT-0989,RCT-099 | Reactor |
I have written a function to do this in python:
def log_parser(log_string):
log_dict = { "T":"Tank","P":"Pump" }
log_list = log_string.split(",")
for i in log_list:
str_extract = i.upper().split("-",1)
if len(log_list) ==1:
result = log_dict[str_extract[0]]
return result
break
else:
idx = log_list.index(i)
for j in range(len(log_list)):
if (idx == j):
continue
str_extract_j = log_list[j].upper().split("-",1)
if str_extract_j[0] != str_extract[0]:
result = "Multiple"
return result
break
else:
result = log_dict[str_extract[0]]
return result
I am not sure how to implement this function in pandas..
Can i define the function in pandas and then use the lamba apply funtion like this:
test_df['LogType'] = test_df[['LogID']].apply(lambda x:log_parser(x), axis=1)
You can use:
# mapping dictionary for types
d = {'T': 'Tank', 'P': 'Pump'}
# extract letters before -
s = df['LogID'].str.extractall('([A-Z])-')[0]
# group by index
g = s.groupby(level=0)
df['LogType'] = (g.first() # get first match
.map(d) # map type name
# mask if several types
.mask(g.nunique().gt(1),
'Multiple')
)
Output:
LogID LogType
0 T-1111 Tank
1 P-09899 Pump
2 P-09189,T-0011 Multiple