Search code examples
pythonplotlygantt-chart

Plotly Gantt Chart: Remove Duplicate Y-Axis Labels and Stack Parallel Tasks


I'm planning on using the python library 'Plotly' to build a Gantt Chart. Specifically this: https://plotly.com/python/gantt/#group-tasks-together.

However, each "Job" could have multiple tasks and these tasks could be running in parallel. From what I have observed Plotly doesn’t stack tasks running in parallel on top of each other making it incredibly hard to read the chart. Here is an example where "Job A" has two tasks running in parallel but only one is visible:

data = [dict(Task="Job A", Start='2009-01-01', Finish='2009-02-28'),
      dict(Task="Job A", Start='2009-01-01', Finish='2009-02-28'),
      dict(Task="Job B", Start='2009-03-05', Finish='2009-04-15'),
      dict(Task="Job C", Start='2009-02-20', Finish='2009-05-30')]

# Without group_tasks=True, There would be two separate "Job A" labels
fig = ff.create_gantt(data, group_tasks=True)
fig.show()

enter image description here

What I want is both "Job A" tasks to be visible but stacked vertically with "Job A" sitting in the center of the vertical space taken up by its tasks. Something like this but without two "Job A" labels:

enter image description here

If anyone has any library recommendations I should consider for my Gantt Chart project please feel free to share! Thank you!


Solution

  • A starting point would be to use fig.add_shape to add an identical Task as a rectangle below the original Task.

    To do this, we need the y-coordinates of each rectangle, but conveniently, the first bar will be at y=0, the second bar at y=1, and so on. Therefore, the index of the unique tasks listed in order is also the y-coordinate (The unique tasks are [Job A, Job B, Job C] so the Job C bar will be centered at y=3). The default width of each bar is 0.8, so y1 should end at y0-0.4 if y0 is the starting y-coordinate of the bar.

    Note that there won't be any hovertemplate for the annotated shapes and the color is the same for each bar the way it is written currently.

    import numpy as np
    import pandas as pd
    import plotly.express as px
    import plotly.graph_objects as go
    
    ## added additional duplicate Task to demonstrate generalizability
    data = [dict(Task="Job A", Start='2009-01-01', Finish='2009-02-28'),
          dict(Task="Job A", Start='2009-01-01', Finish='2009-02-28'),
          dict(Task="Job B", Start='2009-03-05', Finish='2009-04-15'),
          dict(Task="Job C", Start='2009-02-20', Finish='2009-05-30'),
          dict(Task="Job C", Start='2009-02-20', Finish='2009-05-30')]
    
    df = pd.DataFrame(data)
    
    # Without group_tasks=True, There would be two separate "Job A" labels
    # fig = ff.create_gantt(data, group_tasks=True)
    
    ## plot the non-duplicate rows
    fig = px.timeline(df.loc[~df['Task'].duplicated()], x_start="Start", x_end="Finish", y="Task")
    
    ## plot the duplicate rows using rectangular shapes
    for row in df.loc[df['Task'].duplicated()].itertuples():
          y_val = np.where(df.Task.unique()==row[1])[0][0]
          # print(f"found {row[1]} at index {y_val}")
          fig.add_shape(type="rect",
                xref="x", yref="y",
                x0=row[2], x1=row[3], 
                y0=y_val, y1=y_val-0.4,
                line_width=0,
                fillcolor="salmon",
    )
    fig.show()
    

    enter image description here