Search code examples
pythonpandasnumpyapplymode

Python pandas apply: problem with None in first row


I need some help. Writing code for finding mode of group and replace None with that mode. When "None" is in the firs row, thad doesn't work:

df = pd.DataFrame([[16, None, 3], [17, None, 30], [10, "v", 30], [10, "z", 3], [None, "a", 23], [2, "a", 23]], columns=['A', 'B', 'C'])

dict_group = df.groupby('C')['B'].agg(lambda x: pd.Series.mode(x).iat[0]).to_frame().to_dict()

df.apply(lambda s: dict_group["B"][s["C"]] if ((s["B"]==None) | (pd.isnull(s["B"])==True)) else s, axis=1)["B"]

gives the error

TypeError                                 Traceback (most recent call last)
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-129-0f7009f92c25> in <module>
----> 1 df.apply(lambda s: dict_group["B"][s["C"]] if ((s["B"]==None) | (pd.isnull(s["B"])==True)) else s, axis=1)["B"]

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    765         key = com._apply_if_callable(key, self)
    766         try:
--> 767             result = self.index.get_value(self, key)
    768 
    769             if not is_scalar(result):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
   3116         try:
   3117             return self._engine.get_value(s, k,
-> 3118                                           tz=getattr(series.dtype, 'tz', None))
   3119         except KeyError as e1:
   3120             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: 'B'

But whithout "None" in the first row it works great!

df = pd.DataFrame([[16, "y", 3], [17, None, 30], [10, "v", 30], [10, "z", 3], [None, "a", 23], [2, "a", 23]], columns=['A', 'B', 'C'])

dict_group = df.groupby('C')['B'].agg(lambda x: pd.Series.mode(x).iat[0]).to_frame().to_dict()

df.apply(lambda s: dict_group["B"][s["C"]] if ((s["B"]==None) | (pd.isnull(s["B"])==True)) else s, axis=1)["B"]

0    y
1    v
2    v
3    z
4    a
5    a
Name: B, dtype: object

How can I fix it?


Solution

  • Use GroupBy.transform for Series with same size like original, so possible replace Nones or NaNs by Series.fillna.

    Also for more general solution is add next with iter for return None if mode return empty Series and iat[0] failed:

    df = pd.DataFrame([[16, None, 3], [17, None, 30], 
                       [10, "v", 30], [10, "z", 3], 
                       [None, "a", 23], [2, "a", 23]], 
                       columns=['A', 'B', 'C'])
    s = df.groupby('C')['B'].transform(lambda x: next(iter(x.mode()), None))
    df['B'] = df['B'].fillna(s)
    print (df)
    
          A  B   C
    0  16.0  z   3
    1  17.0  v  30
    2  10.0  v  30
    3  10.0  z   3
    4   NaN  a  23
    5   2.0  a  23