I need a pandas apply function to iterate through sublists and return the first value found for each sublist.
I have a dataframe like this:
data = {'project_names': [['Datalabor', 'test', 'tpuframework'], ['regETU', 'register', 'tpuframework'], [], ['gpuframework', 'cpuframework']]}
df = pd.DataFrame(data)
df
I have an nested project list with sublists like this:
project_list_1 = [
['labor', 'DataLab', 'Anotherdatalabor'],
['reg', 'register'],
['gpu'],
['tpu']
]
project_list_1
The final output should look like this:
data = {'matches': [['labor', 'tpu'], ['reg', 'tpu'], [None], ['gpu']]}
final_df = pd.DataFrame(data)
final_df
I tried something like this:
df2['matches'] = df['project_names'].apply(lambda row: next((project for project in project_list_2 if any(project.lower() in word.lower() for word in row)), None))
df2
The method works only for flat lists like this. To collect the first elements found I am using next()
instead of a list comprehension.
project_list_2 = ['labor', 'DataLab', 'register', 'gpu', 'reg', 'tpu']
I need to run the method on project_list_1 and get the desired output described above.
Try:
project_list_1 = [
["labor", "DataLab", "Anotherdatalabor"],
["reg", "register"],
["gpu"],
["tpu"],
]
def fn(v, project_list):
out = []
for project in project_list:
for p in map(str.lower, project):
if any(w for w in map(str.lower, v) if (rv := p) in w):
out.append(rv)
break
return out or [None]
df["matches"] = df["project_names"].apply(fn, project_list=project_list_1)
print(df)
Prints:
project_names matches
0 [Datalabor, test, tpuframework] [labor, tpu]
1 [regETU, register, tpuframework] [reg, tpu]
2 [] [None]
3 [gpuframework, cpuframework] [gpu]