I have tried severally to perform some numeric aggregation methods on numeric data with pandas
. However, I have received a NotImplementedError
, which then throws a TypeError
, whenever I do so. I hypothesize that pandas
is refusing to ignore the string columns when performing said numerical tasks. How do I prevent this?
Given a pivot table named matrix_data
, and with pandas
imported as pan
:
Account Number Company Contact Account Manager Product Licenses
0 2123398 Google Larry Pager Edward Thorp Analytics 150
1 2123398 Google Larry Pager Edward Thorp Prediction 150
2 2123398 Google Larry Pager Edward Thorp Tracking 300
3 2192650 BOBO Larry Pager Edward Thorp Analytics 150
4 420496 IKEA Elon Tusk Edward Thorp Analytics 300
Sale Price Status
0 2100000 Presented
1 700000 Presented
2 350000 Under Review
3 2450000 Lost
4 4550000 Won
Trying to aggregate all numerical values by company:
pan.pivot_table(matrix_data, index = "Company", aggfunc="mean");
throws an exception like so:
NotImplementedError Traceback (most recent call last)
File ~\AppData\Roaming\Python\Python311\site-packages\pandas\core\groupby\groupby.py:1490, in GroupBy._cython_agg_general..array_func(values)
1489 try:
-> 1490 result = self.grouper._cython_operation(
1491 "aggregate",
1492 values,
1493 how,
1494 axis=data.ndim - 1,
1495 min_count=min_count,
1496 **kwargs,
1497 )
1498 except NotImplementedError:
1499 # generally if we have numeric_only=False
1500 # and non-applicable functions
...
1698 # e.g. "foo"
-> 1699 raise TypeError(f"Could not convert {x} to numeric") from err
1700 return x
TypeError: Could not convert Larry PagerLarry PagerLarry Pager to numeric
dataframe.groupby(["col_name1"]).mean()
will throw an identical error
I'm on windows 10, python 3.11, with pandas version 2.0.1. All this was performed on Jupyter Notebook with VScode
This has been deprecated in Pandas 2.0. This is the warning pandas 1.5.3 gives:
FutureWarning: pivot_table dropped a column because it failed to aggregate. This behavior is deprecated and will raise in a future version of pandas. Select only the columns that can be aggregated.
You now have to select the specific columns you want to aggregate.
cols = ['Licenses', 'Sale Price']
pd.pivot_table(matrix_data, values=cols, index="Company", aggfunc="mean")