Search code examples
pythonpandasdataframehistogramsubtraction

Dataframe, sum positive and negative value on specific date interval


here's the context : I have 2 dataframes, one from a database that generate a one month series, and the second one is a download from another database, but on one year.

I have make a plot where df1 and df2 are subtracted, creating a filled curve between, and I force the plot on just one month, so visually it's perfect.

But I want to plot a simple bar histogram to synthesize the filled curve with a superposed bar for positive values and a bar with negative value.

Here the plot (with a factice bar plot just to show where I want the bar plot) :

So, df1 (one month) is purple, df2 (one year) is blue, filled is the subtraction between them, and the bar should display all positive and negative values of the filled curves :

enter image description here

And, it could be great if in the bar there were the text with the summed values.

I tried this command to have the sum of the positive and negative value :

df[df>0].sum()
df[df<0].sum()

but the result is not usable because of the difference of series size (one month / one year). Maybe if I'm using a mask ? But not sure how to do this in this case.


Solution

  • It seem I succeeded to find a way, don't know if it's the better but it's working.

    Here is a part on the (very long) code. In the other part (not posted) I'm acquiring data on some webservice and fill dataframe that doesn't have data, etc ...

    #subtraction of the dataframes, and changing unit (divide by 1000):
    test1 = (inter['P']/1000).subtract(tsweek1, fill_value=0)
    test2 = (inter['P']/1000).subtract(tsweek2, fill_value=0)
    
    
    # creation of the mask for the dataframe that have 1 year of data
    mask_fill = (test1.index >= (pd.to_datetime('2021-04-01 00:00:00'))) & (test1.index <= (pd.to_datetime('2021-04-01 00:00:00')+timedelta(days=+7)))
    test1_resiz = test1.loc[mask_fill]
    
    # creation fo the data sums, positive and negative
    bar1P = test1_resiz[test1_resiz>0].sum()
    bar1N = test1_resiz[test1_resiz<0].sum()
    
    # creation of the plots
    fig, ax = plt.subplots(2,2,sharex=False,sharey=False,figsize=(18,10), dpi=200, gridspec_kw={'width_ratios': [10, 1]})
    
    plt.subplots_adjust(top=0.92)
    plt.subplots_adjust(left=0.08)
    plt.subplots_adjust(right=0.85)
    plt.subplots_adjust(bottom=0.05)
    
    # first graph
    ax[0,0].plot(dfweek1.index, tsweek1, '#CC0066' , linewidth=2, alpha=0.3)
    ax[0,0].plot(inter.index, (inter['P']/1000), '#3333FF' , linewidth=2, alpha=0.3)
    ax[0,0].fill_between(test1.index, test1, linewidth=3, where=test1 >= 0, facecolor='#00CC00', interpolate=True, alpha=0.7)
    ax[0,0].fill_between(test1.index, test1, linewidth=3, where=test1 <= 0, facecolor='#FF3333', interpolate=True, alpha=0.7)
    ax[0,0].set_xlim([(pd.to_datetime('2021-04-01 00:00:00')),(pd.to_datetime('2021-04-01 00:00:00')+timedelta(days=+7))])
    ax[0,0].xaxis.grid(True, which='major', color='g', linestyle='dotted', linewidth=0.3)
    ax[0,0].yaxis.grid(True, which='major', color='g', linestyle='dotted', linewidth=0.3)
    ax[0,0].set_yticks(np.arange(-6, 8.1, 1))
    
    # second graph
    ax[1,0].plot(dfweek2.index, tsweek2, '#CC0066' , linewidth=2, alpha=0.3)
    ax[1,0].plot(inter.index, (inter['P']/1000), '#3333FF' , linewidth=2, alpha=0.3)
    ax[1,0].fill_between(test2.index, test2, linewidth=3, where=test2 >= 0, facecolor='#00CC00', interpolate=True, alpha=0.7)
    ax[1,0].fill_between(test2.index, test2, linewidth=3, where=test2 <= 0, facecolor='#FF3333', interpolate=True, alpha=0.7)
    ax[1,0].set_xlim([(pd.to_datetime('2021-04-08 00:00:00')),(pd.to_datetime('2021-04-08 00:00:00')+timedelta(days=+7))])
    ax[1,0].xaxis.grid(True, which='major', color='g', linestyle='dotted', linewidth=0.3)
    ax[1,0].yaxis.grid(True, which='major', color='g', linestyle='dotted', linewidth=0.3)
    ax[1,0].set_yticks(np.arange(-6, 8.1, 1))
    
    # first bar plot
    
    x=1
    ax[0,1].bar(x, bar1P, color='#00CC00', edgecolor='g', alpha=0.7)
    ax[0,1].bar(x, bar1N, color='#FF3333', edgecolor='r', alpha=0.7)
    ax[0,1].text(x, float(bar1P)/2, '+' + str("%.2f" % bar1P), ha='center', fontweight='bold')
    ax[0,1].text(x, float(bar1N)/2, str("%.2f" % bar1N), ha='center', fontweight='bold')
    
    # second bar plot
    
    ax[0,1].axes.get_xaxis().set_visible(False)
    ax[1,1].bar(x, test2.mean(), color='b', alpha=0.7)
    ax[1,1].axes.get_xaxis().set_visible(False)
    

    The result is here (I just make the first bar plot, to validate the code and implement it on the first part of my entire code) :

    enter image description here