Search code examples
pythonpandasplot

How to do a line plot with vertical lines indicating changes in group


I have a data set with the following structure:

enter image description here

Group2 are the individual identifiers and Group1 are larger groups identifiers to which each individual belong to. I need to plot the Value column on the y-axis in the order in which they appear in the Group2 column, but letting the x-axis labels to be from the Group1 column and not from Group2. Then I need to add vertical lines for each change of value of Group1, so that I can identify which part of the plot belongs to group A, group B and so on, similar to this example (ignore the yellow and purple lines, I just need a single line): enter image description here

The blue vertical line represents when all the individuals from group A have been plotted, and now we start plotting the individuals from group B and so on. Any ideas on how do achieve this?

import pandas as pd

a = {'Group1':['A', 'A', 'A', 'A', 'B', 'B','C', 'C', 'C'], 'Group2': ['A1', 'A2', 'A3', 'A4', 'B1', 'B2', 'C1', 'C2', 'C3'],
    'Value':[0.06, 0.12, 0.11, 0.04, 0.09, 0.2, 0.1, 0.08, 0.2]}

df = pd.DataFrame(a)
df.set_index(['Group1', 'Group2'], inplace=True)

Solution

  • You can use matplotlib for this, vertical lines can be added using plt.axvline and you'll need to customize your xticks:

    # init
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    
    a = {'Group1':['A', 'A', 'A', 'A', 'B', 'B','C', 'C', 'C'], 'Group2': ['A1', 'A2', 'A3', 'A4', 'B1', 'B2', 'C1', 'C2', 'C3'],
        'Value':[0.06, 0.12, 0.11, 0.04, 0.09, 0.2, 0.1, 0.08, 0.2]}
    
    df = pd.DataFrame(a)
    
    # plot
    cumsum = df.groupby('Group1')['Value'].count().cumsum()
    a =  [0] + cumsum.to_list()
    a = [0.5*(a[i] + a[i+1]) for i in range(len(a)-1)]
    plt.plot(range(len(df)), df['Value'])
    
    # update x ticks
    plt.xticks(a, cumsum.index)
    
    # add vertical lines
    for x in cumsum - 0.5:
      plt.axvline(x=x, color='red')
    
    

    Output:

    enter image description here