Search code examples
pythonpandasmatplotlibparallel-coordinates

Problems representing non-numerical (positional) rankings with pandas.plotting.parallel_coordinates


This problem has kept me going around like a headless chicken for longer than I want to admit.

I have a ranking in a dataframe with the following format (It is a summary example).

+---------+-------+-------+-------+-------+-------+--+
| ranking | Day 1 | Day 2 | Day 3 | Day 4 | Day 5 |  |
+---------+-------+-------+-------+-------+-------+--+
| 1       | adria | adria | marta | marta | adria |  |
+---------+-------+-------+-------+-------+-------+--+
| 2       | marta | marta | dani  | dani  | marta |  |
+---------+-------+-------+-------+-------+-------+--+
| 3       | dani  | dani  | adria | adria | dani  |  |
+---------+-------+-------+-------+-------+-------+--+
| 4       | abel  | abel  | abel  | abel  | abel  |  |
+---------+-------+-------+-------+-------+-------+--+
| 5       |       | joan  | joan  |       |       |  |
+---------+-------+-------+-------+-------+-------+--+

In short, there are several players, who go up and down the ranking. Finally there is a player (Joan) who only plays two days, and disappears.

The first impulse has been to use pandas.plotting.parallel_coordinates (https://pandas.pydata.org/docs/reference/api/pandas.plotting.parallel_coordinates.html)

With the following code:

plt.figure(figsize = (20,5)) # Plot Width & Height
pd.plotting.parallel_coordinates(
  df, 'ranking',
  axvlines = False,
  marker='o', # Show marker
  markersize=12, #The Marker Size
  linewidth=6, # The Line Width
  alpha=0.9, # Opacity of lines
  )

plt.gca().invert_yaxis() # This inverts the Y aixs.
plt.legend('')
plt.style.use('fivethirtyeight') # This is the style
plt.show()

But the result is not at all what was expected:

Link to the image: link

Note that I have inverted the Y axis to present the position #1 above

Problem Nº 1: The lines do not follow the order of the table. As you can see, player "dani" never gets in first position, but in the representation, you will see that it goes up two positions in the ranking, to top. If you compare the data in the table with the visualization, you will see the same thing happen with other players. They do not follow the positions of the table.

Problem Nº 2: I don't know how to represent Joan. The line should only represents on days he played.

Problem Nº 3: This is a very simple visualization, but let's imagine we have hundreds of players over many days. This can complicate following the colors. I have thought about putting the name of the player as a label on each of the points of the lines, but I have not been able to find a method ...

My hypotheses range, from the simple fact that I am useless (Ockham's razor prevailing), to that it is not possible to represent this data in this way with this library.

I've been tempted to try using something akin to a Sankey Diagram for this ... but I don't think it's exactly what I need either and it greatly complicates the code.

I would appreciate it if you can help me out with this, because I have really reached a point where I have not solved the problem after many attempts.

Any ideas will be welcome.

Thanks!


Solution

  • I don't know how to do this in pandas itself, but it's possible to do something like that in Altair if you first melt your DataFrame:

    import altair as alt
    
    alt.Chart(
        df.melt("ranking", var_name="day", value_name="player").dropna()
        , width=500
    ).mark_line(
        strokeWidth=5,
        opacity=0.5
    ).encode(
        alt.X('day:N', title=""),
        alt.Y('ranking:Q', scale=alt.Scale(domain=[1, 5], reverse=True)),
        color='player:N',
        tooltip='player:N',
    )
    

    Which gets you: enter image description here

    Or you can add text to each point in the plot like:

    import altair as alt
    
    base = alt.Chart(
        df.melt("ranking", var_name="day", value_name="player").dropna()
        , width=500
    ).encode(
        alt.X('day:N', title=""),
        alt.Y('ranking:Q', scale=alt.Scale(domain=[1, 5], reverse=True)),
    )
    
    base.mark_line(
        strokeWidth=5,
        opacity=0.5
    ).encode(
        color='player:N',
        tooltip='player:N',
    ) + base.mark_text(
        fontSize=16
    ).encode(
        text='player:N'
    )
    

    enter image description here