Search code examples
python-3.xgraphvizamazon-personalize

Graphviz - optimising a graph representing AWS Personalize model training pipeline


I have written a Graphviz graph describing a pipeline for an AWS Personalize workflow. The workflow logic is as follows:

  1. create dataset import jobs for the interactions, items and users data (load from S3 bucket)
  2. check if a solution already exists
  3. if not, create a solution; else, create a solution version (the training)
  4. check if a campaign already exists
  5. if not, create a campaign, with the new solution version added; else, update the campaign with the solution version

The resulting graph looks OK, but isn't quite right - I'd like edges, representing triggers, between the dataset import job creation nodes and the data flows between the S3 bucket and AWS Personalize, and the S3 bucket and AWS Personalize to be at the top of the graph in landscape mode (L -> R), and the logic nodes to be below.

I'd also like to center the S3 and AWS Personalize nodes with respect to the pipeline nodes - they should be on an axis passing through the midpoint line of the graph, looking at it vertically.

The existing graph code and graph are below. Can someone please suggest how I can implement the changes? I've tried playing around with orientation of the resource flows graph, but it doesn't quite work.

digraph pipeline {

    edge [fontsize=24]

    subgraph resource_flows {
        {
            orientation=L
            rankdir=LR
            node [style=filled]
            s3_bucket [label="S3 bucket"
                       fillcolor=orange
                       margin=0.25
                       penwidth=2.0
                       shape=cylinder
                       width=0.5]
            aws_personalize [label="AWS Personalize"
                             fillcolor=aquamarine
                             margin=0.25
                             penwidth=2.0
                             shape=box3d
                             width=0.5]
        }
        s3_bucket -> aws_personalize [color=white]
    }
    
    subgraph actions {
        { 
            rankdir=TB
            node [fillcolor=lightgreen
                  fontsize=32
                  margin=0.25
                  penwidth=2.0
                  shape=rectangle
                  width=0.5
                  style=filled]
            create_interactions_dataset_import_job [label="Create interactions dataset import job"]
            create_items_dataset_import_job [label="Create items dataset import job"]
            create_users_dataset_import_job [label="Create users dataset import job"]
            solution_exists [label="Solution exists?" fillcolor=lightblue shape=diamond]
            create_solution [label="Create solution"]
            create_solution_version [label="Create solution version"]
            campaign_exists [label="Campaign exists?" fillcolor=lightblue shape=diamond]
            create_campaign [label="Create campaign"]
            update_campaign [label="Update campaign"]
            terminal_node1 [label="" fillcolor=white pencolor=white shape=plaintext]
            terminal_node2 [label="" fillcolor=white pencolor=white shape=plaintext]
        }
        create_interactions_dataset_import_job -> solution_exists [label="interactions dataset import job ARN"]
        create_items_dataset_import_job -> solution_exists [label="items dataset import job ARN"]
        create_users_dataset_import_job -> solution_exists [label="users dataset import job ARN"]
        solution_exists -> create_solution [arrowhead=None label="No"]
        solution_exists -> create_solution_version [arrowhead=None label="Yes"]
        create_solution -> create_solution_version [label="solution ARN"]
        create_solution_version -> campaign_exists [label="solution version ARN"]
        campaign_exists -> create_campaign [arrowhead=None label="No"]
        campaign_exists -> update_campaign [arrowhead=None label="Yes"]
        create_campaign -> terminal_node1 [label="campaign ARN"]
        update_campaign -> terminal_node2 [label="campaign ARN"]
    }
 
    subgraph actions_and_resource_flows {
        {
            orientation=L
            rankdir=LR
            interactions_dataset_import_job_trigger [shape=point width=0]
            items_dataset_import_job_trigger [shape=point width=0]
            users_dataset_import_job_trigger [shape=point width=0]
            dummy [color=white shape=point]
        }

        {s3_bucket aws_personalize} -> dummy [arrowhead=None penwidth=0.0]
        dummy -> create_interactions_dataset_import_job [arrowhead=None penwidth=0.0]
        s3_bucket -> interactions_dataset_import_job_trigger [arrowhead=None]
        interactions_dataset_import_job_trigger -> create_interactions_dataset_import_job [dir=back label="triggers" style=dashed]
        interactions_dataset_import_job_trigger -> aws_personalize [label="Interactions dataset import"]
        
        {s3_bucket aws_personalize} -> dummy [arrowhead=None penwidth=0.0]
        dummy -> create_items_dataset_import_job [arrowhead=None penwidth=0.0]
        s3_bucket -> items_dataset_import_job_trigger [arrowhead=None]
        items_dataset_import_job_trigger -> create_items_dataset_import_job [dir=back label="triggers" style=dashed]
        items_dataset_import_job_trigger -> aws_personalize [label="Items dataset import"]
        
        {s3_bucket aws_personalize} -> dummy [arrowhead=None penwidth=0.0]
        dummy -> create_users_dataset_import_job [arrowhead=None penwidth=0.0]
        s3_bucket -> users_dataset_import_job_trigger [arrowhead=None]
        users_dataset_import_job_trigger -> create_users_dataset_import_job [dir=back label="triggers" style=dashed]
        users_dataset_import_job_trigger -> aws_personalize [label="Users dataset import"]
   }
}

enter image description here


Solution

  • Is this what you were after - or at least closer? Used rank=same to get the two nodes on the same rank (horizontally aligned). Also commented on some incorrect attribute use.

    digraph pipeline {
    //
    // replaced arrowhead=None with arrowhead=none
    //
        edge [fontsize=24]
    
        subgraph resource_flows {
            {rank=same   //  horizontally align (same rank)
                orientation=L   // only applies to entire graph
                rankdir=LR      // only applies to entire graph
                node [style=filled]
                s3_bucket [label="S3 bucket"
                           fillcolor=orange
                           margin=0.25
                           penwidth=2.0
                           shape=cylinder
                           width=0.5]
                aws_personalize [label="AWS Personalize"
                                 fillcolor=aquamarine
                                 margin=0.25
                                 penwidth=2.0
                                 shape=box3d
                                 width=0.5]
            }
            s3_bucket -> aws_personalize [color=white]
        }
        
        subgraph actions {
            { 
                rankdir=TB
                node [fillcolor=lightgreen
                      fontsize=32
                      margin=0.25
                      penwidth=2.0
                      shape=rectangle
                      width=0.5
                      style=filled]
                create_interactions_dataset_import_job [label="Create interactions dataset import job"]
                create_items_dataset_import_job [label="Create items dataset import job"]
                create_users_dataset_import_job [label="Create users dataset import job"]
                solution_exists [label="Solution exists?" fillcolor=lightblue shape=diamond]
                create_solution [label="Create solution"]
                create_solution_version [label="Create solution version"]
                campaign_exists [label="Campaign exists?" fillcolor=lightblue shape=diamond]
                create_campaign [label="Create campaign"]
                update_campaign [label="Update campaign"]
                terminal_node1 [label="" fillcolor=white pencolor=white shape=plaintext]
                terminal_node2 [label="" fillcolor=white pencolor=white shape=plaintext]
            }
            create_interactions_dataset_import_job -> solution_exists [label="interactions dataset import job ARN"]
            create_items_dataset_import_job -> solution_exists [label="items dataset import job ARN"]
            create_users_dataset_import_job -> solution_exists [label="users dataset import job ARN"]
            solution_exists -> create_solution [arrowhead=none label="No"]
            solution_exists -> create_solution_version [arrowhead=none label="Yes"]
            create_solution -> create_solution_version [label="solution ARN"]
            create_solution_version -> campaign_exists [label="solution version ARN"]
            campaign_exists -> create_campaign [arrowhead=none label="No"]
            campaign_exists -> update_campaign [arrowhead=none label="Yes"]
            create_campaign -> terminal_node1 [label="campaign ARN"]
            update_campaign -> terminal_node2 [label="campaign ARN"]
        }
     
        subgraph actions_and_resource_flows {
            {
                orientation=L   // only applies to entire graph
                rankdir=LR      // only applies to entire graph
                interactions_dataset_import_job_trigger [shape=point width=0]
                items_dataset_import_job_trigger [shape=point width=0]
                users_dataset_import_job_trigger [shape=point width=0]
                dummy [color=white shape=point]
            }
    
            {s3_bucket aws_personalize} -> dummy [arrowhead=none penwidth=0.0]
            dummy -> create_interactions_dataset_import_job [arrowhead=none penwidth=0.0]
            s3_bucket -> interactions_dataset_import_job_trigger [arrowhead=none]
            interactions_dataset_import_job_trigger -> create_interactions_dataset_import_job [dir=back label="triggers" style=dashed]
            interactions_dataset_import_job_trigger -> aws_personalize [label="Interactions dataset import"]
            
            {s3_bucket aws_personalize} -> dummy [arrowhead=none penwidth=0.0]
            dummy -> create_items_dataset_import_job [arrowhead=none penwidth=0.0]
            s3_bucket -> items_dataset_import_job_trigger [arrowhead=none]
            items_dataset_import_job_trigger -> create_items_dataset_import_job [dir=back label="triggers" style=dashed]
            items_dataset_import_job_trigger -> aws_personalize [label="Items dataset import"]
            
            {s3_bucket aws_personalize} -> dummy [arrowhead=none penwidth=0.0]
            dummy -> create_users_dataset_import_job [arrowhead=none penwidth=0.0]
            s3_bucket -> users_dataset_import_job_trigger [arrowhead=none]
            users_dataset_import_job_trigger -> create_users_dataset_import_job [dir=back label="triggers" style=dashed]
            users_dataset_import_job_trigger -> aws_personalize [label="Users dataset import"]
       }
    }
    

    Giving:
    enter image description here