How to convert a dict into a pandas multiindex dataframe

I´m handling distances between elements (e.g. a1,a2, ...) of different attributes (e.g. A, B, ...) and I choose a dict to store the distances. The dict has the following form:

mydict = {('A', ('a1','a2')): 1.0,
          ('A', ('a1','a3')): 0.5,
          ('A', ('a2','a1')): 1.1,
          ('A', ('a2','a3')): 0.8,
          ('A', ('a3','a1')): 1.2,
          ('A', ('a3','a2')): 1.2,
          ('B', ('b1','b2')): 1.0,
          ('B', ('b1','b3')): 0.5,
          ('B', ('b2','b1')): 1.1,
          ('B', ('b2','b3')): 0.8,
          ('B', ('b3','b1')): 1.2,
          ('B', ('b3','b2')): 1.2,
          }

So the keys of the dict are tuples with a first element giving the attribute and the second element beeing a tuple itself giving the two elements, which distance is given in the corresponding value.

Now I want to display the data in form of crosstables which should look somewhat like this:

A   a1   a2   a3
a1    0  1.0   0.5
a2  1.1    0   0.8
a3  1.2  1.2    0

B   b1   b2   b3
b1    0  1.0   0.5
b2  1.1    0   0.8
b3  1.2  1.2    0

and so on for each attribut.

I tried to to convert the data to a DataFrame in oder to maybe use the casstab function of pandas. I tried to convert the keys of the dict to a list and use pandas.MultiIndex.from_tuples and then MultiIndex.to_frame but I did´t get a usable format.

Any suggestions how to deal with this or to store the distance data differently to begin with?

Solution

I think the format of the data is fine. You just need to unpack it correctly to get a useable dataframe.

Construct a dataframe from the dict values and dict keys.
Unpack the level of the index with the element names to two columns.
Apply crosstab with the appropriate index and columns (apply .fillna(0) to the result to get the exact same structure as in your question).


df = pd.DataFrame(my_dict.values(), pd.MultiIndex.from_tuples(my_dict.keys()))
df[['first_element', 'second_element']] = df.index.get_level_values(1).to_list()
pd.crosstab(
            [df.index.get_level_values(0), df.first_element],
            df.second_element,
            values=df[0],
            aggfunc='sum'
            )