Search code examples
pythonbashgnuplot

Probability density of data matrix


I have a data file like this:

1 volume 318.31
2 volume 317.59
3 volume 316.71
4 volume 315.78
5 volume 315.07
6 volume 314.55
7 volume 314.38
8 volume 314.40
9 volume 314.86
10 volume 315.74
11 volume 316.83
12 volume 318.08
13 volume 319.44
14 volume 320.70
15 volume 322.04
16 volume 323.17
17 volume 323.77
18 volume 324.16
19 volume 324.02
20 volume 323.59
...
10000 volume 322.72
10001 volume 321.69
10002 volume 320.34
10003 volume 318.87

I would like to plot volumes vs probability like the following: enter image description here

Is there any effective python module or bash shell+gnuplot function to deal with this problem?


Solution

  • Almost any plotting package in python will allow you to generate a kde plot. If you are using pandas, then there is a simple solution to use the 'density' keyword for the type of plot to generate.

    I was not sure if the indices on the left were included in the file or not. Here is the solution if the indices are not in the data file. I am assuming the file is named "data.txt".

    import pandas as pd
    data=pd.read_csv('data.txt', sep=' ', header=None, names=['string', 'volume'])
    data['volume'].plot(kind = 'density')
    

    If the indices are included in the data file, then you could modify the code above to look like this:

    import pandas as pd
    data=pd.read_csv('data.txt', sep=' ', header=None, names=['index', 'string', 'volume'])
    data['volume'].plot(kind = 'density')
    

    The following will ALL produce the same chart:

    data['volume'].plot(kind = 'density')
    
    data['volume'].plot.kde()
    
    data['volume'].plot.density()
    

    Hope this helps.