Search code examples
python-3.xpandasgroup-bypivot-table

How do I prevent 'NotImplementedError' and 'TypeError' when using numeric aggregate functions in Pandas pivot tables with string columns?


I have tried severally to perform some numeric aggregation methods on numeric data with pandas. However, I have received a NotImplementedError, which then throws a TypeError, whenever I do so. I hypothesize that pandas is refusing to ignore the string columns when performing said numerical tasks. How do I prevent this?

Given a pivot table named matrix_data, and with pandas imported as pan:

  Account Number  Company      Contact Account Manager     Product  Licenses   
0         2123398   Google  Larry Pager    Edward Thorp   Analytics       150  
1         2123398   Google  Larry Pager    Edward Thorp  Prediction       150   
2         2123398   Google  Larry Pager    Edward Thorp    Tracking       300   
3         2192650     BOBO  Larry Pager    Edward Thorp   Analytics       150   
4          420496     IKEA    Elon Tusk    Edward Thorp   Analytics       300   

   Sale Price        Status  
0     2100000     Presented  
1      700000     Presented  
2      350000  Under Review  
3     2450000          Lost  
4     4550000           Won  

Trying to aggregate all numerical values by company:

pan.pivot_table(matrix_data, index = "Company", aggfunc="mean");

throws an exception like so:

NotImplementedError                       Traceback (most recent call last)
File ~\AppData\Roaming\Python\Python311\site-packages\pandas\core\groupby\groupby.py:1490, in GroupBy._cython_agg_general..array_func(values)
   1489 try:
-> 1490     result = self.grouper._cython_operation(
   1491         "aggregate",
   1492         values,
   1493         how,
   1494         axis=data.ndim - 1,
   1495         min_count=min_count,
   1496         **kwargs,
   1497     )
   1498 except NotImplementedError:
   1499     # generally if we have numeric_only=False
   1500     # and non-applicable functions
...
   1698             # e.g. "foo"
-> 1699             raise TypeError(f"Could not convert {x} to numeric") from err
   1700 return x

TypeError: Could not convert Larry PagerLarry PagerLarry Pager to numeric

dataframe.groupby(["col_name1"]).mean() will throw an identical error

I'm on windows 10, python 3.11, with pandas version 2.0.1. All this was performed on Jupyter Notebook with VScode


Solution

  • This has been deprecated in Pandas 2.0. This is the warning pandas 1.5.3 gives:

    FutureWarning: pivot_table dropped a column because it failed to aggregate. This behavior is deprecated and will raise in a future version of pandas. Select only the columns that can be aggregated.

    You now have to select the specific columns you want to aggregate.

    cols = ['Licenses', 'Sale Price']
    pd.pivot_table(matrix_data, values=cols, index="Company", aggfunc="mean")