Search code examples
pythonpandassqlitesqlite-browser

Using Pandas and sqlite3


Try to implement the privote_table of pandas to produce a table for each of party and each state shows how much the party receievd in total contributions from the state.

Is this the right way to do or i has to get into the data base and get fectched out. However the code below gives error.

party_and_state = candidates.merge(contributors, on='id')
party_and_state.pivot_table(df,index=["party","state"],values=["amount"],aggfunc=[np.sum])

The expected result could be something like the table below. The first coulmn is the state name then the party D underneath the party D is the total votes from each state, the same applies with the party R

+-----------------+---------+--------+
|     state       | D       | R      |
+-----------------+---------+--------+
|      AK         | 500     | 900    |
|      IL         | 600     | 877    |
|      FL         | 200     | 400    |
|      UT         | 300     | 300    |
|      CA         | 109     | 90     |
|      MN         | 800     | 888    |

Solution

  • Consider the generalized pandas merge with pd as qualifier instead of a dataframe since the join fields are differently named hence requiring left_on and right_on args. Additionally, do not pass in df if running pivot_table as method of a dataframe since the called df is passed into the function.

    Below uses the contributors and contributors_with_candidates text files. Also, per your desired results, you may want to use the values arg of pivot_table:

    import numpy as np
    import pandas as pd
    
    contributors = pd.read_table('contributors_with_candidate_id.txt', sep="|")
    candidates = pd.read_table('candidates.txt', sep="|")
    
    party_and_state = pd.merge(contributors, candidates, 
                               left_on=['candidate_id'], right_on=['id'])
    party_and_state.pivot_table(index=["party", "state"],
                                values=["amount"], aggfunc=np.sum)    
    #                amount
    # party state          
    # D     CA      1660.80
    #       DC       200.09
    #       FL      4250.00
    #       IL       200.00
    #       MA       195.00
    # ...
    # R     AK      1210.00
    #       AR     14200.00
    #       AZ       120.00
    #       CA     -6674.53
    #       CO     -5823.00
    
    party_and_state.pivot_table(index=["state"], columns=["party"],
                                values=["amount"], aggfunc=np.sum)
    #         amount          
    # party        D         R
    # state                   
    # AK         NaN   1210.00
    # AR         NaN  14200.00
    # AZ         NaN    120.00
    # CA     1660.80  -6674.53
    # CO         NaN  -5823.00
    # CT         NaN   2300.00
    

    Do note, you can do the merge as an inner join in SQL with read_sql:

    party_and_state = pd.read_sql("SELECT c.*, n.* FROM contributors c " +
                                  "INNER JOIN candidates n ON c.candidate_id = n.id", 
                                  con = db)
    
    party_and_state.pivot_table(index=["state"], columns=["party"],
                                values=["amount"], aggfunc=np.sum)