I need some help. Writing code for finding mode of group and replace None with that mode. When "None" is in the firs row, thad doesn't work:
df = pd.DataFrame([[16, None, 3], [17, None, 30], [10, "v", 30], [10, "z", 3], [None, "a", 23], [2, "a", 23]], columns=['A', 'B', 'C'])
dict_group = df.groupby('C')['B'].agg(lambda x: pd.Series.mode(x).iat[0]).to_frame().to_dict()
df.apply(lambda s: dict_group["B"][s["C"]] if ((s["B"]==None) | (pd.isnull(s["B"])==True)) else s, axis=1)["B"]
gives the error
TypeError Traceback (most recent call last)
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
TypeError: an integer is required
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-129-0f7009f92c25> in <module>
----> 1 df.apply(lambda s: dict_group["B"][s["C"]] if ((s["B"]==None) | (pd.isnull(s["B"])==True)) else s, axis=1)["B"]
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
765 key = com._apply_if_callable(key, self)
766 try:
--> 767 result = self.index.get_value(self, key)
768
769 if not is_scalar(result):
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
3116 try:
3117 return self._engine.get_value(s, k,
-> 3118 tz=getattr(series.dtype, 'tz', None))
3119 except KeyError as e1:
3120 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
KeyError: 'B'
But whithout "None" in the first row it works great!
df = pd.DataFrame([[16, "y", 3], [17, None, 30], [10, "v", 30], [10, "z", 3], [None, "a", 23], [2, "a", 23]], columns=['A', 'B', 'C'])
dict_group = df.groupby('C')['B'].agg(lambda x: pd.Series.mode(x).iat[0]).to_frame().to_dict()
df.apply(lambda s: dict_group["B"][s["C"]] if ((s["B"]==None) | (pd.isnull(s["B"])==True)) else s, axis=1)["B"]
0 y
1 v
2 v
3 z
4 a
5 a
Name: B, dtype: object
How can I fix it?
Use GroupBy.transform
for Series
with same size like original, so possible replace None
s or NaN
s by Series.fillna
.
Also for more general solution is add next
with iter
for return None
if mode return empty Series
and iat[0]
failed:
df = pd.DataFrame([[16, None, 3], [17, None, 30],
[10, "v", 30], [10, "z", 3],
[None, "a", 23], [2, "a", 23]],
columns=['A', 'B', 'C'])
s = df.groupby('C')['B'].transform(lambda x: next(iter(x.mode()), None))
df['B'] = df['B'].fillna(s)
print (df)
A B C
0 16.0 z 3
1 17.0 v 30
2 10.0 v 30
3 10.0 z 3
4 NaN a 23
5 2.0 a 23