Search code examples
pythonpandasdataframegroup-bypivot-table

Difference between groupby and pivot_table for pandas dataframes


I just started learning Pandas and was wondering if there is any difference between groupby and pivot_table functions. Can anyone help me understand the difference between them?


Solution

  • Both pivot_table and groupby are used to aggregate your dataframe. The difference is only with regard to the shape of the result.

    Using pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum) a table is created where a is on the row axis, b is on the column axis, and the values are the sum of c.

    Example:

    df = pd.DataFrame({"a": [1,2,3,1,2,3], "b":[1,1,1,2,2,2], "c":np.random.rand(6)})
    pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum)
    
    b         1         2
    a                    
    1  0.528470  0.484766
    2  0.187277  0.144326
    3  0.866832  0.650100
    

    Using groupby, the dimensions given are placed into columns, and rows are created for each combination of those dimensions.

    In this example, we create a series of the sum of values c, grouped by all unique combinations of a and b.

    df.groupby(['a','b'])['c'].sum()
    
    a  b
    1  1    0.528470
       2    0.484766
    2  1    0.187277
       2    0.144326
    3  1    0.866832
       2    0.650100
    Name: c, dtype: float64
    

    A similar usage of groupby is if we omit the ['c']. In this case, it creates a dataframe (not a series) of the sums of all remaining columns grouped by unique values of a and b.

    print df.groupby(["a","b"]).sum()
                c
    a b          
    1 1  0.528470
      2  0.484766
    2 1  0.187277
      2  0.144326
    3 1  0.866832
      2  0.650100