I´m handling distances between elements (e.g. a1,a2, ...) of different attributes (e.g. A, B, ...) and I choose a dict
to store the distances. The dict
has the following form:
mydict = {('A', ('a1','a2')): 1.0,
('A', ('a1','a3')): 0.5,
('A', ('a2','a1')): 1.1,
('A', ('a2','a3')): 0.8,
('A', ('a3','a1')): 1.2,
('A', ('a3','a2')): 1.2,
('B', ('b1','b2')): 1.0,
('B', ('b1','b3')): 0.5,
('B', ('b2','b1')): 1.1,
('B', ('b2','b3')): 0.8,
('B', ('b3','b1')): 1.2,
('B', ('b3','b2')): 1.2,
}
So the keys of the dict
are tuple
s with a first element giving the attribute and the second element beeing a tuple
itself giving the two elements, which distance is given in the corresponding value.
Now I want to display the data in form of crosstables which should look somewhat like this:
A a1 a2 a3
a1 0 1.0 0.5
a2 1.1 0 0.8
a3 1.2 1.2 0
B b1 b2 b3
b1 0 1.0 0.5
b2 1.1 0 0.8
b3 1.2 1.2 0
and so on for each attribut.
I tried to to convert the data to a DataFrame
in oder to maybe use the casstab
function of pandas. I tried to convert the keys of the dict to a list and use pandas.MultiIndex.from_tuples
and then MultiIndex.to_frame
but I did´t get a usable format.
Any suggestions how to deal with this or to store the distance data differently to begin with?
I think the format of the data is fine. You just need to unpack it correctly to get a useable dataframe.
.fillna(0)
to the result to get the exact same structure as in your question).
df = pd.DataFrame(my_dict.values(), pd.MultiIndex.from_tuples(my_dict.keys()))
df[['first_element', 'second_element']] = df.index.get_level_values(1).to_list()
pd.crosstab(
[df.index.get_level_values(0), df.first_element],
df.second_element,
values=df[0],
aggfunc='sum'
)