Search code examples
pandasmatplotlibhistogram

Area-elevation distribution histogram from pandas dataframe, python


I have a pandas DataFrame with 'area' and 'elevation' columns. I tried to show area distribution with respect to elevation. I thought it should be simple but I did not find a way to do that.

My dataframe looks like:

df:

    area              ele
0   97395.147254    6017.877745
1   69704.264405    5974.124316
2   71490.518833    5838.256686
3   23235.692475    5793.837788
4   65254.056661    5787.050911
5   17407.853780    5734.049234
6   17556.149643    6106.984128
7   33557.481232    5716.453589
8   37932.703188    5870.938016
9   19417.303768    5987.567275
10  26290.210275    6232.612380
11  45211.104174    5777.812375
12  35457.722243    5707.920921
13  83353.269135    5778.740416
14  68906.869455    5951.361295
15  66991.699542    6146.242249
16  43415.962994    6041.594263
17  74985.484055    5835.818736
18  53145.779672    5952.993800
19  36893.436921    6008.634508
20  59647.991246    5883.823537
21  53032.278932    5771.375295

Obviously df.hist() did not work. I am looking for something like: enter image description here

Where the area should be in the y-axis.


Solution

  • import matplotlib.pyplot as plt
    plt.hist(df.ele,bins=[100*x+5500 for x in range(0,10)],weights=df.area)
    

    enter image description here

    Explanation:
    [100*x+5500 for x in range(0,10)] forms the bins for the histogram for every 100 m of elevation: 5500, 5600, ..., 6400. This is not essential but it makes the histogram look nicer, otherwise the bins will not end at round numbers.
    The 'summing up' is done by using the weights keyword, as already mentioned in the comments to the question above. See the pyplot.hist documentation for further details.