Search code examples
pythonpandasdataframepivot-table

DataFrame pivot_table sum returning zero on some columns when there is data


I have a DataFrame that looks like this:

|id|name|start_date|clicks|conversions|installs|downloads|
|--|----|----------|------|-----------|--------|-----------|
|101|India|2023-06-06|1|4|0|NaN|
|101|India|2023-06-06|2|5|0|NaN|
|101|India|2023-06-06|1|6|0|NaN|
|201|Kenya|2023-09-15|5|8|0|NaN|
|201|Kenya|2023-09-15|2|1|0|NaN|
|201|NaN|NaN|NaN|NaN|NaN|10|
|201|NaN|NaN|NaN|NaN|NaN|5|
|201|NaN|NaN|NaN|NaN|NaN|4|

My code is this:

    pivot_df = pd.pivot_table(
        df,
        index=['id','name','start_date'],
        aggfunc={'clicks': 'sum','conversions': 'sum','installs': 'sum', 'downloads': 'sum'})  

And the output is:

|id|name|start_date|clicks|conversions|installs|downloads|
|--|----|----------|------|-----------|--------|-----------|
|101|India|2023-06-06|4|15|0|0|
|201|Kenya|2023-09-15|7|9|0|0|

The output is I expected to get

|id|name|start_date|clicks|conversions|installs|downloads|
|--|----|----------|------|-----------|--------|-----------|
|101|India|2023-06-06|4|15|0|0|
|201|Kenya|2023-09-15|7|9|0|19|

The sum of conversions comes out 0 and not 19. What have I done incorrectly?


Solution

  • You probably don't want a pivot_table but rather a groupby.agg:

    out = (df.groupby('id', as_index=False)
             .agg({'name': 'first',
                   'start_date': 'first',
                   'clicks': 'sum',
                   'conversions': 'sum',
                   'installs': 'sum',
                   'downloads': 'sum'})
          )
    

    Output:

        id   name  start_date  clicks  conversions  installs  downloads
    0  101  India  2023-06-06     4.0         15.0       0.0        0.0
    1  201  Kenya  2023-09-15     7.0          9.0       0.0       19.0