I have a data set with the following structure:
Group2 are the individual identifiers and Group1 are larger groups identifiers to which each individual belong to.
I need to plot the Value column on the y-axis in the order in which they appear in the Group2 column, but letting the x-axis labels to be from the Group1 column and not from Group2. Then I need to add vertical lines for each change of value of Group1, so that I can identify which part of the plot belongs to group A, group B and so on, similar to this example (ignore the yellow and purple lines, I just need a single line):
The blue vertical line represents when all the individuals from group A have been plotted, and now we start plotting the individuals from group B and so on. Any ideas on how do achieve this?
import pandas as pd
a = {'Group1':['A', 'A', 'A', 'A', 'B', 'B','C', 'C', 'C'], 'Group2': ['A1', 'A2', 'A3', 'A4', 'B1', 'B2', 'C1', 'C2', 'C3'],
'Value':[0.06, 0.12, 0.11, 0.04, 0.09, 0.2, 0.1, 0.08, 0.2]}
df = pd.DataFrame(a)
df.set_index(['Group1', 'Group2'], inplace=True)
You can use matplotlib for this, vertical lines can be added using plt.axvline
and you'll need to customize your xticks
:
# init
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
a = {'Group1':['A', 'A', 'A', 'A', 'B', 'B','C', 'C', 'C'], 'Group2': ['A1', 'A2', 'A3', 'A4', 'B1', 'B2', 'C1', 'C2', 'C3'],
'Value':[0.06, 0.12, 0.11, 0.04, 0.09, 0.2, 0.1, 0.08, 0.2]}
df = pd.DataFrame(a)
# plot
cumsum = df.groupby('Group1')['Value'].count().cumsum()
a = [0] + cumsum.to_list()
a = [0.5*(a[i] + a[i+1]) for i in range(len(a)-1)]
plt.plot(range(len(df)), df['Value'])
# update x ticks
plt.xticks(a, cumsum.index)
# add vertical lines
for x in cumsum - 0.5:
plt.axvline(x=x, color='red')
Output: