Search code examples
pandasdataframegroup-by

Change different row data into columns


I am a beginner in Pandas and I have been struggling with this problem for several days now.

I am using pandas, plotly and streamlit in order to create more customized charts

Below is the table I have in Pandas, which data is taken from an excel sheet

      Project    Status  PF_Invoice_Amount
0  FS Revadim  Not Paid              30000
1   FS Segula  Not Paid              55000
2   FS Segula      Paid              15000
3   Payita II  Not Paid             100000
4     Solfarm      Paid              32000
5    Tropigas  Not Paid              20000
6      Trufps      Paid              23000

I have tried to groupby but hasn’t worked

My aim is to get an output similar to Excel Pivot Tables, in pandas, as shown below

   Project   Paid  Not Paid
FS Revadim            30000
 FS Segula  15000     55000
 Payita II           100000
   Solfarm  32000
  Tropigas            20000
    Trufps  23000

You guidance would be highly appreciated.


Solution

  • Use pivot if you have only one instance of ('Project', 'Status') or pivot_table with an aggregate function like 'sum' if you have multiple instances of ('Project', 'Status'):

    out = df.pivot_table(index='Project', columns='Status', 
                         values='PF_Invoice_Amount', aggfunc='sum', fill_value=0)
            .reset_index().rename_axis(columns=None))
    print(out)
    
    # Output
          Project  Not Paid   Paid
    0  FS Revadim     30000      0
    1   FS Segula     55000  15000
    2   Payita II    100000      0
    3     Solfarm         0  32000
    4    Tropigas     20000      0
    5      Trufps         0  23000
    

    Minimal Working Example:

    data = {'Project': ['FS Revadim', 'FS Segula', 'FS Segula', 'Payita II', 'Solfarm', 'Tropigas', 'Trufps'],
            'Status': ['Not Paid', 'Not Paid', 'Paid', 'Not Paid', 'Paid', 'Not Paid', 'Paid'],
            'PF_Invoice_Amount': [30000, 55000, 15000, 100000, 32000, 20000, 23000]}
    df = pd.DataFrame(data)
    print(df)
    
    # Output
          Project    Status  PF_Invoice_Amount
    0  FS Revadim  Not Paid              30000
    1   FS Segula  Not Paid              55000
    2   FS Segula      Paid              15000
    3   Payita II  Not Paid             100000
    4     Solfarm      Paid              32000
    5    Tropigas  Not Paid              20000
    6      Trufps      Paid              23000
    

    More information: How can I pivot a dataframe?