Search code examples
pythonpandasmatplotlibviolin-plot

Violin plot of a list of arrays


I have some data in the format:

[array([[0, 1, 2]], dtype=int64), array([[1, 2, 3]], dtype=int64)]

My data can be generated using:

di_DFs = {}
groups = [1,2]
for grp in groups:
    di_DFs[grp] = pd.DataFrame({'A' : [grp-1],
                                'B' : [grp],
                                'C' : [grp+1]})
data = []
for k in di_DFs:
    data.append(di_DFs[k].iloc[[0]].values)

I can plot it:

for v in data:
    plt.scatter(range(len(v[0])),v[0])

enter image description here

I would like to get a violin plot with 3 vertical violins where my pairs of points are in the scatter plot please, to compare the distributions within my arrays. I tried:

for v in data:
    plt.violinplot(v)

But I got:

enter image description here


Solution

  • I needed to re-format my data:

    df_Vi = pd.DataFrame({'Z' : data[0][0],
                          'Y' : data[1][0]}, index=range(len(data[0][0])))
        
    plt.violinplot(df_Vi)
    

    enter image description here

    Or, a version that works with more data:

    
    di_DFs = {}
    groups = [1,2,0,7]
    for grp in groups:
        di_DFs[grp] = pd.DataFrame({'A' : [grp-1],
                                    'B' : [grp],
                                    'C' : [grp+1]})
    data = []
    for k in di_DFs:
        data.append(di_DFs[k].iloc[[0]].values)
    
    Indexes = range(len(groups))
        
    df_Vi = pd.DataFrame()
        
    for inD in Indexes:
        df_Po = pd.DataFrame({inD : data[inD][0]},
                              index=range(len(data[0][0])))
        
        df_Vi = pd.concat([df_Vi, df_Po], axis=1)
    
    plt.violinplot(df_Vi)
    

    enter image description here