I have a dataframe where each row is a dictionary on which I'd like to use seaborn's horizontal box plot.
'dialog'
I'm thinking I'll have to do a pd.melt
first to restructure the data first so that the new columns would be 'dialog_num'
, 'model_type'
, and 'value'
(automatic variable name after doing a melt, but basically the rows of dictionaries).
After that, perhaps break the 'value'
variable so that each column is a part of speech ('ADV', 'INTJ', 'VERB', etc.) (this part seems tricky to me). Past this point...do a for loop on all of the columns and apply the horizontal boxplot?
import pandas as pd
pos =\
{'dialog_num': {0: 0, 1: 1, 2: 2},
'model1': {0: {'ADV': 0.072, 'INTJ': 0.03, 'PRON': 0.133, 'VERB': 0.109},
1: {'ADJ': 0.03, 'NOUN': 0.2, 'PRON': 0.13},
2: {'ADV': 0.083, 'PRON': 0.125, 'VERB': 0.0625}},
'model2': {0: {'ADJ': 0.1428, 'ADV': 0.1428, 'AUX': 0.1428, 'INTJ': 0.285},
1: {'ADJ': 0.1, 'DET': 0.1, 'NOUN': 0.1, 'PROPN': 0.1, 'VERB': 0.2},
2: {'CCONJ': 0.166, 'NOUN': 0.333, 'SPACE': 0.166, 'VERB': 0.3333}},
'model3': {0: {'ADJ': 0.06, 'CCONJ': 0.06, 'NOUN': 0.2, 'PRON': 0.266, 'SPACE': 0.066, 'VERB': 0.333},
1: {'AUX': 0.15, 'PRON': 0.25, 'PUNCT': 0.15, 'VERB': 0.15},
2: {'ADP': 0.125, 'PRON': 0.0625, 'PUNCT': 0.0625, 'VERB': 0.25}},
'model4': {0: {'ADJ': 0.25, 'ADV': 0.08, 'CCONJ': 0.083, 'PRON': 0.166},
1: {'AUX': 0.33, 'PRON': 0.2, 'VERB': 0.0667},
2: {'CCONJ': 0.125, 'NOUN': 0.125, 'PART': 0.125, 'PRON': 0.125, 'SPACE': 0.125, 'VERB': 0.375}}}
df = pd.DataFrame.from_dict(pos)
display(df)
dialog_num model1 model2 model3 model4
0 0 {'INTJ': 0.03, 'ADV': 0.072, 'PRON': 0.133, 'VERB': 0.109} {'INTJ': 0.285, 'AUX': 0.1428, 'ADV': 0.1428, 'ADJ': 0.1428} {'PRON': 0.266, 'VERB': 0.333, 'ADJ': 0.06, 'NOUN': 0.2, 'CCONJ': 0.06, 'SPACE': 0.066} {'PRON': 0.166, 'ADV': 0.08, 'ADJ': 0.25, 'CCONJ': 0.083}
1 1 {'PRON': 0.13, 'ADJ': 0.03, 'NOUN': 0.2} {'PROPN': 0.1, 'VERB': 0.2, 'DET': 0.1, 'ADJ': 0.1, 'NOUN': 0.1} {'PRON': 0.25, 'AUX': 0.15, 'VERB': 0.15, 'PUNCT': 0.15} {'PRON': 0.2, 'AUX': 0.33, 'VERB': 0.0667}
2 2 {'PRON': 0.125, 'ADV': 0.083, 'VERB': 0.0625} {'VERB': 0.3333, 'CCONJ': 0.166, 'NOUN': 0.333, 'SPACE': 0.166} {'PRON': 0.0625, 'VERB': 0.25, 'PUNCT': 0.0625, 'ADP': 0.125} {'PRON': 0.125, 'VERB': 0.375, 'PART': 0.125, 'CCONJ': 0.125, 'NOUN': 0.125, 'SPACE': 0.125}
sns.boxplot
expects data
to be supplied in a long form when specifying x=
and y=
.sns.catplot
will be used because there is a col=
parameter, which can be used to create separate plots for speech types..melt
to unpivot the wide dataframe..json_normalize
can be used to convert the the 'value'
column (dict
type) into a flat table.
vals
) to dfm
with .join
.
vals
and dfm
have matching indices..melt
the dataframe again.python 3.10
, pandas 1.4.2
, matplotlib 3.5.1
, seaborn 0.11.2
import pandas as pd
import seaborn as sns
# load the dict into a dataframe
df = pd.DataFrame(pos)
# unpivot the dataframe
dfm = df.melt(id_vars='dialog_num', var_name='model')
# convert the 'value' column of dicts to a flat table
vals = pd.json_normalize(dfm['value'])
# combine vals to dfm, without the 'value' column
dfm = dfm.iloc[:, 0:-1].join(vals)
# unpivot the dataframe again
dfm = dfm.melt(id_vars=['dialog_num', 'model'])
p = sns.boxplot(data=dfm, x='value', y='model')
p = sns.catplot(kind='box', data=dfm, x='value', y='model', col='variable', col_wrap=4, height=4)
dfm.head()
dialog_num model value
0 0 model1 {'INTJ': 0.03, 'ADV': 0.072, 'PRON': 0.133, 'VERB': 0.109}
1 1 model1 {'PRON': 0.13, 'ADJ': 0.03, 'NOUN': 0.2}
2 2 model1 {'PRON': 0.125, 'ADV': 0.083, 'VERB': 0.0625}
3 0 model2 {'INTJ': 0.285, 'AUX': 0.1428, 'ADV': 0.1428, 'ADJ': 0.1428}
4 1 model2 {'PROPN': 0.1, 'VERB': 0.2, 'DET': 0.1, 'ADJ': 0.1, 'NOUN': 0.1}
vals.head()
INTJ ADV PRON VERB ADJ NOUN AUX PROPN DET CCONJ SPACE PUNCT ADP PART
0 0.030 0.0720 0.133 0.1090 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN 0.130 NaN 0.0300 0.2 NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN 0.0830 0.125 0.0625 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 0.285 0.1428 NaN NaN 0.1428 NaN 0.1428 NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN 0.2000 0.1000 0.1 NaN 0.1 0.1 NaN NaN NaN NaN NaN
dfm.head()
dialog_num model INTJ ADV PRON VERB ADJ NOUN AUX PROPN DET CCONJ SPACE PUNCT ADP PART
0 0 model1 0.030 0.0720 0.133 0.1090 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 1 model1 NaN NaN 0.130 NaN 0.0300 0.2 NaN NaN NaN NaN NaN NaN NaN NaN
2 2 model1 NaN 0.0830 0.125 0.0625 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 0 model2 0.285 0.1428 NaN NaN 0.1428 NaN 0.1428 NaN NaN NaN NaN NaN NaN NaN
4 1 model2 NaN NaN NaN 0.2000 0.1000 0.1 NaN 0.1 0.1 NaN NaN NaN NaN NaN
dfm.head()
dialog_num model variable value
0 0 model1 INTJ 0.030
1 1 model1 INTJ NaN
2 2 model1 INTJ NaN
3 0 model2 INTJ 0.285
4 1 model2 INTJ NaN