Search code examples
pythonpandaslist-comprehensionapply

How to iterate through Pandas and return first match found in sublist


I need a pandas apply function to iterate through sublists and return the first value found for each sublist.

I have a dataframe like this:

data = {'project_names': [['Datalabor', 'test', 'tpuframework'], ['regETU', 'register', 'tpuframework'], [], ['gpuframework', 'cpuframework']]}
df = pd.DataFrame(data)

df

I have an nested project list with sublists like this:

project_list_1 = [
    ['labor', 'DataLab', 'Anotherdatalabor'],
    ['reg', 'register'],
    ['gpu'],
    ['tpu']
]

project_list_1

The final output should look like this:

data = {'matches': [['labor', 'tpu'], ['reg', 'tpu'], [None], ['gpu']]}
final_df = pd.DataFrame(data)

final_df 

I tried something like this:

df2['matches'] = df['project_names'].apply(lambda row: next((project for project in project_list_2 if any(project.lower() in word.lower() for word in row)), None))
df2

The method works only for flat lists like this. To collect the first elements found I am using next() instead of a list comprehension.

project_list_2 = ['labor', 'DataLab', 'register', 'gpu', 'reg', 'tpu']

I need to run the method on project_list_1 and get the desired output described above.


Solution

  • Try:

    project_list_1 = [
        ["labor", "DataLab", "Anotherdatalabor"],
        ["reg", "register"],
        ["gpu"],
        ["tpu"],
    ]
    
    
    def fn(v, project_list):
        out = []
        for project in project_list:
            for p in map(str.lower, project):
                if any(w for w in map(str.lower, v) if (rv := p) in w):
                    out.append(rv)
                    break
        return out or [None]
    
    
    df["matches"] = df["project_names"].apply(fn, project_list=project_list_1)
    print(df)
    

    Prints:

                          project_names       matches
    0   [Datalabor, test, tpuframework]  [labor, tpu]
    1  [regETU, register, tpuframework]    [reg, tpu]
    2                                []        [None]
    3      [gpuframework, cpuframework]         [gpu]