Search code examples
pythonpandasfillna

is there a better way to do segmented fillna with method 'ffill' with pandas?


Let me explain this situation. the thing is i'm currently working with data that is categorized sometimes and sometimes don't. So i decided to use fillna's pandas with 'ffil' as method. I just don't feel this is the optimal and/or cleaner solution. if someone could help me with a better aproach i'll be so grateful. Here some code to demostrate the point:

data = {
    "detail":['apple mac', 'apple iphone x', 'samsumg galaxy s10', 'samsumg galaxy s10', 'hp computer'],
    'category': ['computer', 'phone', 'phone', np.NaN, np.NaN]
}

df = pd.DataFrame(data)

Returns

    detail              category
0   apple mac           computer
1   apple iphone x      phone
2   samsumg galaxy s10  phone
3   samsumg galaxy s10  NaN
4   hp computer         NaN

first i filtered detail values without category:

details_without_cats = df[df.category.isnull()].detail.unique()

then i loop through these values to fill if correponds:

for detail_wc in details_without_cats:
    df[df.detail == detail_wc] = df[df.detail == detail_wc].fillna(method = 'ffill')
print(df)

returns exactly what i want

    detail              category
0   apple mac           computer
1   apple iphone x      phone
2   samsumg galaxy s10  phone
3   samsumg galaxy s10  phone
4   hp computer         NaN

the dilemma is as follows. What happens if i have this situation with thousands or millions of samples. Is there a better way? please help


Solution

  • If you want to create a dict of items with values to use later you can do this:

    maps = df.dropna().set_index('detail').to_dict()['category']
    df['category'] = df.set_index('detail').index.map(maps)
    

    maps

    {'apple mac': 'computer',
     'apple iphone x': 'phone',
     'samsumg galaxy s10': 'phone'}
    

    output:

                   detail  category
    0           apple mac  computer
    1      apple iphone x     phone
    2  samsumg galaxy s10     phone
    3  samsumg galaxy s10     phone
    4         hp computer       NaN