For documentation of the restructuring of a data table from "wide" using a criteria column for each score to using a score
column and a criterion
column my first reaction was to use UML class diagram.
I am aware that by changing the structure of the data table, the class attributes have not changed.
My first question is whether the wide or the long version is the more correct representation of the data table?
My second question is whether it would make sense to relate the two representations - and if so, by which relationship?
My third question would be whether something else than a UML class diagram would be more suitable for documenting the reshaping (data preprocessing before showing distribution as a box pot in R).
You jumped a little bit to fast from the table to the UML. This makes your question very confusing, because what is wide as a table is represented long as a class, and the contrary.
Reformulating your problem, it appears that you are refactoring some tables. The wide table shows several values for a same student in the same row. This means that the maximum number of exercises is fixed by the table structure:
ID Ex1 Ex2 Ex3 .... Ex N
-----------------------------
111 A A A ... A
119 A C - ... D
127 B F B ... F
The long table has fewer columns, and each row shows only 1 specific score of 1 specific student:
ID # Score
---------------
111 1 A
111 2 A
111 3 A
...
111 N A
119 1 A
119 2 C
...
You can model this structure in an UML class diagram. But in UML, the table layout doesn't matter: that's an issue of the ORM mapping and you could perfectly have one class model (with an attribute or an association having a multiplicity 1..N
) that could be implemented using either the wide or the long version. If the multiplicity would be 1..*
only the long option would work.
Now to your questions:
«table»
stereotype, to clarify what you are modelling a table (so a low level view on your design).