Search code examples
pythonpandasmatplotlibcounterspyder

How do I groupby, count or sum and then plot two lines in Pandas?


Say I have the following dataframes:

Earthquakes:

        latitude    longitude   place       year
0       36.087000   -106.168000 New Mexico  1973
1       33.917000   -90.775000  Mississippi 1973
2       37.160000   -104.594000 Colorado    1973
3       37.148000   -104.571000 Colorado    1973
4       36.500000   -100.693000 Oklahoma    1974
…         …             …          …         …    
13941   36.373500   -96.818700  Oklahoma    2016
13942   36.412200   -96.882400  Oklahoma    2016
13943   37.277167   -98.072667  Kansas      2016
13944   36.939300   -97.896000  Oklahoma    2016
13945   36.940500   -97.906300  Oklahoma    2016

and Wells:

           LAT         LONG     BBLS    Year
0       36.900324   -98.218260  300.0   1977
1       36.896636   -98.177720  1000.0  2002
2       36.806113   -98.325840  1000.0  1988
3       36.888589   -98.318530  1000.0  1985
4       36.892128   -98.194620  2400.0  2002
…          …            …          …     …
11117   36.263285   -99.557631  1000.0  2007
11118   36.263220   -99.548647  1000.0  2007
11119   36.520160   -99.334183  19999.0 2016
11120   36.276728   -99.298563  19999.0 2016
11121   36.436857   -99.137391  60000.0 2012

How do I manage to make a line graph showing the number of BBLS per year (from Wells), and the number of Earthquakes that occurred in a year (from Earthquakes), where the x-axis shows the year since 1980 and the y1-axis shows the sum of BBLS per year, while y2-axis shows the number of earthquakes.

I believe I need to make a groupby, count(for earthquakes) and sum(for BBLS) in order to make the plot but I really tried so many codings and I just don't get how to do it.

The only one that kinda worked was the line graph for earthquakes as follows:

Earthquakes.pivot_table(index=['year'],columns='type',aggfunc='size').plot(kind='line')

enter image description here

Still, for the line graph for BBLS nothing has worked

Wells.pivot_table(index=['Year'],columns='BBLS',aggfunc='count').plot(kind='line')

This one either:

plt.plot(Wells['Year'].values, Wells['BBL'].values, label='Barrels Produced')
plt.legend() # Plot legends (the two labels)
plt.xlabel('Year') # Set x-axis text
plt.ylabel('Earthquakes') # Set y-axis text
plt.show() # Display plot

This one from another thread either:

fig, ax = plt.subplots(figsize=(10,8))
Earthquakes.plot(ax = ax, marker='v')
ax.title.set_text('Earthquakes and Injection Wells')
ax.set_ylabel('Earthquakes')
ax.set_xlabel('Year')
ax.set_xticks(Earthquakes['year'])

ax2=ax.twinx()
ax2.plot(Wells.Year, Wells.BBL, color='c', 
        linewidth=2.0, label='Number of Barrels', marker='o')
ax2.set_ylabel('Annual Number of Barrels')
lines_1, labels_1 = ax.get_legend_handles_labels()
lines_2, labels_2 = ax2.get_legend_handles_labels()

lines = lines_1 + lines_2
labels = labels_1 + labels_2

ax.legend(lines, labels, loc='upper center')

Solution

  • Input data:

    >>> df2  # Earthquakes
         year
    0    2007
    1    1974
    2    1979
    3    1992
    4    2006
    ..    ...
    495  2002
    496  2011
    497  1971
    498  1977
    499  1985
    
    [500 rows x 1 columns]
    
    >>> df1  # Wells
          BBLS  year
    0    16655  1997
    1     7740  1998
    2    37277  2000
    3    20195  2014
    4    11882  2018
    ..     ...   ...
    495  30832  1981
    496  24770  2018
    497  14949  1980
    498  24743  1975
    499  46933  2019
    
    [500 rows x 2 columns]
    

    Prepare data to plot:

    data1 = df1.value_counts("year").sort_index().rename("Earthquakes")
    data2 = df2.groupby("year")["BBLS"].sum()
    

    Simple plot:

    ax1 = data1.plot(legend=data1.name, color="blue")
    ax2 = data2.plot(legend=data2.name, color="red", ax=ax1.twinx())
    

    Now, you can do whatever with the 2 axes.

    Earthquakes / BBLS plot

    A more controlled chart

    # Figure and axis
    fig, ax1 = plt.subplots()
    ax2 = ax1.twinx()
    
    # Data
    line1, = ax1.plot(data1.index, data1.values, label="Earthquakes", color="b")
    line2, = ax2.plot(data2.index, data2.values / 10**6, label="Barrels", color="r")
    
    # Legend
    lines = [line1, line2]
    ax1.legend(lines, [line.get_label() for line in lines])
    
    # Titles
    ax1.set_title("")
    ax1.set_xlabel("Year")
    ax1.set_ylabel("Earthquakes")
    ax2.set_ylabel("Barrels Produced (MMbbl)")