Search code examples
pythonmatplotlibplothistogrambin

How can I break up this vertical binned histogram every n number of seconds?


I have a CSV file with 2 columns: first column is time in seconds, and second column is a value from -1 to 1 for each nth second. The header of the file I am using is the following:

0,0.04408189999999999
1000,0.017673066666666678
2000,0.05512853333333334
3000,0.04731979999999998
4000,0.007375333333333322
5000,-0.0173186
6000,-0.030183500000000016
7000,-0.09746066666666667
8000,-0.11819146666666666
9000,-0.1189849333333333
10000,-0.10441406666666667
11000,-0.09025903333333336
12000,-0.14047866666666667
13000,-0.09634883333333336
14000,-0.09841593333333337
15000,-0.10307009999999997
16000,-0.08617349999999996
17000,-0.09265753333333335
18000,-0.11357536666666662
19000,-0.0669533666666667
20000,-0.05702283333333334
21000,-0.018528333333333317
22000,-0.0845192666666667
23000,-0.11929543333333334
24000,-0.12107416666666668

Using python, I have plotted a frequency histogram using the code below:

data.iloc[:, 0:1:1].hist(bins=[-1.0, -0.9, -0.8, -0.7, -0.6, -0.5, 
                               -0.4, -0.3, -0.2, -0.1, 0.0, 0.1, 
                                0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 
                                0.9, 1.0], 
           color='b', edgecolor='white', xlabelsize=8, ylabelsize=8, 
           grid=False, figsize=(10,8), orientation="horizontal")

Which produced:

enter image description here

The current frequency-histogram shows the bars for all time. However, I want to show a frequency-histogram at t=100s, t=200s, t=300s, etc... and show it in the same plot, like this graph:

enter image description here

How can I achieve that in python?


Solution

  • To my knowledge, there is no "out of the box" tool to do this. But you can hack a bit using shifted axes to emulate this behavior. This works by using twin axes, i.e. each hist plot is on its own axes, whose range is slightly shifted to be away from the previous plot. The ticks are hidden for those axes and the original (empty) axes is used to plot the "time" variable.

    # dummy data
    df = pd.DataFrame({'time': np.random.randint(0,10, size=10000),
                       'value': np.random.normal(scale=0.3, size=10000)*2-1
                      })
    # plotting
    shift=1
    ax = plt.subplot()
    n = len(df['time'].unique())
    i = 0
    ax.set_xlim(0,n*shift)
    ax.set_xlabel('time group')
    for name,d in df.groupby('time'):
        ax2 = ax.twiny()
        ax2.set_xlim(-i*shift, (n-i)*shift)
        ax2.hist(d['value'], orientation='horizontal', density=True)
        ax2.xaxis.set_visible(False)
        i+=1
    

    output:

    matplotlib multiple shifted hist plots

    NB. I set density=True to ensure a width of 1 for each hist, but you can use the counts instead, you then need to increase shift to the maximum expected count and relabel the "time" axis