Search code examples
pythonpandasmatplotlibparallel-coordinates

Order of plotting in Pandas.plotting.parallel_coordinates


I have a series of measurements I want to plot as pandas.plotting.parallel_coordinates, where the color of the individual line is given by the value of one pandas.column.

Code looks like this:

... data retrieval and praparation from a couple of Excel files
---> output = 'largeDataFrame'

theColormap: ListedColormap = cm.get_cmap('some cmap name')

# This is a try to stack the lines in the right order.. (doesn't work)
largeDataFrames.sort_values(column_for_line_color_derivation, inplace=True, ascending=True)

# here comes the actual plotting of data
sns.set_style('ticks')
sns.set_context('paper')
plt.figure(figsize=(10, 6))
thePlot: plt.Axes = parallel_coordinates(largeDataFrame, class_column=column_for_line_color_derivation, cols=[columns to plot], color=theColormap.colors)
plt.title('My Title')
thePlot.get_legend().remove()
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

This works quite well and yields the following result:

Result Plot

Now I would like to have the yellow lines (high values of "column_for_line_color_derivation") plotted in front of the green and darker lines, so they become more prominent. In other words, I want to influence the order of stacking the lines by values of "column_for_line_color_derivation". Up to now I didn't find a way to do that.


Solution

  • I ran some tests with the pandas versions 1.1.2 and 1.0.3 and in both cases the lines are drawn from low to high value of the coloring column, independent of the dataframe order.

    You can temporarily add parallel_coordinates(...., lw=5) which makes it very clear. With thin lines, the order is less visible, as the yellow lines have less contrast.

    The parameter sort_labels= seems to have the opposite effect of its name: when False (default), the lines are drawn in sorted order, when True, they keep the dataframe order.

    Here is a small reproducible example:

    import numpy as np
    import pandas as pd
    from pandas.plotting import parallel_coordinates
    import matplotlib.pyplot as plt
    
    df = pd.DataFrame({ch: np.random.randn(100) for ch in 'abcde'})
    df['coloring'] = np.random.randn(len(df))
    
    fig, axes = plt.subplots(ncols=2, figsize=(14, 6))
    for ax, lw in zip(axes, [1, 5]):
        parallel_coordinates(df, class_column='coloring', cols=df.columns[:-1], colormap='viridis', ax=ax, lw=lw)
        ax.set_title(f'linewidth={lw}')
        ax.get_legend().remove()
    plt.show()
    

    example plot

    An idea is to change the linewidth depending on the class:

    fig, ax = plt.subplots(figsize=(8, 6))
    
    parallel_coordinates(df, class_column='coloring', cols=df.columns[:-1], colormap='viridis', ax=ax)
    num_lines = len(ax.lines)
    for ind, line in enumerate(ax.lines):
        xs = line.get_xdata()
        if xs[0] != xs[-1]:  # skip the vertical lines representing axes
            line.set_linewidth(1 + 3 * ind / num_lines)
    ax.set_title(f'linewidth depending on class_column')
    ax.get_legend().remove()
    plt.show()
    

    linewidth depending on class_column