Search code examples
pythonpandasplotnine

ValueError: Wrong number of items passed 2, placement implies 1 when graphing a facet_grid with plotnine


I have the following Dataframe built with Pandas:

    SampleSize      Mean  StandardDeviation
0            5  0.134151           0.739142
1           25 -0.111257           1.154803
2           45 -0.049999           0.918167
3           65 -0.162783           1.179178
4           85 -0.097452           0.966980
5          105 -0.050559           1.161751
6          125 -0.038383           1.018117
7          145  0.086192           1.028177
8          165  0.045295           1.090246
9          185 -0.107837           1.101610
10         205  0.088160           0.967483
...
40         805  0.020641           1.007389
41         825  0.022781           0.991498
42         845 -0.027429           0.962288
43         865 -0.105373           1.007109
44         885 -0.054397           1.015499
45         905 -0.023729           0.989168
46         925  0.025044           0.989950
47         945  0.021345           1.035740
48         965  0.023404           0.963122
49         985  0.020648           1.000148

It is a total of 50 random normal samples' sizes, means, and stdevs. I am trying to graph a facet_grid showing both the mean and standard deviation compared side by side to the sample size.

The code I am using currently is:

df1 = pd.DataFrame({'SampleSize': range(5, SAMPLE_SIZE, 20), 'Mean': means, 'StandardDeviation': stdev})

df1_melted = pd.melt(df1, id_vars=['SampleSize'], var_name='SampleSize', value_name='Value')

ggplot(df1_melted, aes(x='SampleSize', y='Value', color='SampleSize')) + \
    geom_line() + \
    geom_point() + \
    facet_grid('SampleSize ~ .') + \
    labs(x='SampleSize', y='Mean and StandardDeviation')

This results in:

...
/usr/lib/python3.7/site-packages/pandas/core/internals/blocks.py in new_block(values, placement, ndim, klass)
   1935 
   1936     values, _ = extract_pandas_array(values, None, ndim)
-> 1937     check_ndim(values, placement, ndim)
   1938 
   1939     if klass is None:

/usr/lib/python3.7/site-packages/pandas/core/internals/blocks.py in check_ndim(values, placement, ndim)
   1978         if len(placement) != len(values):
   1979             raise ValueError(
-> 1980                 f"Wrong number of items passed {len(values)}, "
   1981                 f"placement implies {len(placement)}"
   1982             )

ValueError: Wrong number of items passed 2, placement implies 1

I am confused on where this is going wrong as it worked when I graphed each of the 2 graphs separately.


Solution

  • The issue is with your melt statement. You have:

    df1_melted = pd.melt(df1, id_vars=['SampleSize'], var_name='SampleSize', value_name='Value')
    

    which produces:

    enter image description here

    Note that 'SampleSize' doesn't actually contain the sample size and that there are two of them.

    Now consider:

    melted = pd.melt(df1, id_vars=['SampleSize'], value_vars=['Mean','StandardDeviation'])
    

    which produces:

    enter image description here

    Given 'SampleSize' is repeated twice in your melted dataframe, it wasn't clear to me whether you intended to have a different coloured line for the mean and standard deviation graphs, or whether you wanted to have the line change colour based on the sample size. I went with the latter.

    p = (ggplot(melted, aes(x='SampleSize', y='value',color='SampleSize')) 
        + theme_light(9)
        + geom_line()
        + geom_point()
        + facet_grid('variable ~ .')
        + labs(x='Sample size', y='', color='Sample\n  size\n')
        + theme(
            legend_title=element_text(size=8.5),
            legend_title_align='center',
            legend_box_spacing=0.025,
            legend_key_height = 34,
            legend_key_width = 9,           
          )
    )
    p
    

    enter image description here