Can I fill NaN-Values of one column with specific list elements of another column?

I have following dataframe (called items) for example:

| index | itemID | maintopic | subtopics          |
|:----- |:------:|:---------:| ------------------:|
| 1     | 235    | FBR       | [FZ, 1RH, FL]      |
| 2     | 1787   | NaN       | [1RH, YRS, FZ, FL] |
| 3     | 2454   | NaN       | [FZX, 1RH, FZL]    |
| 4     | 3165   | NaN       | [YHS]              |

I would like to fill the NaN-Values in the maintopic-column with the first element of the subtopics list which starts with a letter. Does someone has an idea? (Question No 1)

I tried this, but it didn´t work:

import pandas as pd
import string
alphabet = list(string.ascii_lowercase)
    
items['maintopic'] = items['maintopic'].apply(lambda x : items['maintopic'].fillna(items['subtopics'][x][0]) if items['subtopics'][x][0].lower().startswith(tuple(alphabet)) else x)

Advanced (Question No 2): Even better would be to have a look at all elements of the subtopics list and if there are more elements which have the first letter or even the first and the second letter in common, then I would like to take this. For example in line 2 there is FZ and FL, so i would like to fill the maintopic in this row with an F. And in line 3 there is FZX and FZL, then I would like to fill the maintopic with FZ. But if this is way too complicated then I would be also very happy with an answer to Question No 1.

I appreciate any help!

Solution

Try:

from itertools import chain, combinations


def commonprefix(m):
    "Given a list of pathnames, returns the longest common leading component"
    if not m:
        return ""
    s1 = min(m)
    s2 = max(m)
    for i, c in enumerate(s1):
        if c != s2[i]:
            return s1[:i]
    return s1


def powerset(iterable, n=0):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(n, len(s) + 1))


def choose(x):
    if not isinstance(x, list):
        return x

    if len(x) == 1:
        return x[0]

    filtered = [v for v in x if not v[0].isdigit()]
    if not filtered:
        return np.nan

    longest = ""
    for s in powerset(filtered, 2):
        pref = commonprefix(s)
        if len(pref) > len(longest):
            longest = pref

    return filtered[0] if longest == "" else longest


m = df["maintopic"].isna()
df.loc[m, "maintopic"] = df.loc[m, "subtopics"].apply(choose)
print(df)

Prints:

   index  itemID maintopic           subtopics
0      1     235       FBR       [FZ, 1RH, FL]
1      2    1787         F  [1RH, YRS, FZ, FL]
2      3    2454        FZ     [FZX, 1RH, FZL]
3      4    3165       YHS               [YHS]

EDIT: Added checks for a list/float.