Search code examples
pythonpandasdictionaryanova

Python: ANOVA with dictionaries of different lengths


I have the following data:

data = {'treatment_1': [80, 0, 0, 8],
        'treatment_2': [78, 62],
        'treatment_3': [85, 62, 10, 3, 18, 18, 98, 71, 78, 12, 52, 39, 24, 13],
        'treatment_4': [78, 33, 78, 40, 47, 32]
       }

I am trying to run an ANOVA comparing these four treatments. As you can see, there are different numbers of data points in each treatment. Now, this shouldn't be a problem in theory, because ANOVA does not assume equal sample sizes. First, I tried to create a DataFrame. The code:

import pandas as pd
df = pd.DataFrame(data)

Gives me the error message:

ValueError: All arrays must be of the same length

So, this tells me that a DataFrame will not work. But no matter how I search for "Anova with unequal sample sizes," all I find is information using lists (and their code does not work with dictionaries) and/or equal sample sizes (which do not explain how to adjust for unequal sample sizes). How should I approach an ANOVA with dictionaries of different lengths? Or maybe I'm going about this wrong using dictionaries in the first place?


Solution

  • data = {'treatment_1': [80, 0, 0, 8],
            'treatment_2': [78, 62],
            'treatment_3': [85, 62, 10, 3, 18, 18, 98, 71, 78, 12, 52, 39, 24, 13],
            'treatment_4': [78, 33, 78, 40, 47, 32]
            }
    
    df = pd.DataFrame({k: pd.Series(v) for k, v in data.items()})
    print(df)
    

    Prints:

        treatment_1  treatment_2  treatment_3  treatment_4
    0          80.0         78.0           85         78.0
    1           0.0         62.0           62         33.0
    2           0.0          NaN           10         78.0
    3           8.0          NaN            3         40.0
    4           NaN          NaN           18         47.0
    5           NaN          NaN           18         32.0
    6           NaN          NaN           98          NaN
    7           NaN          NaN           71          NaN
    8           NaN          NaN           78          NaN
    9           NaN          NaN           12          NaN
    10          NaN          NaN           52          NaN
    11          NaN          NaN           39          NaN
    12          NaN          NaN           24          NaN
    13          NaN          NaN           13          NaN