Search code examples
amazon-s3aws-cdkaws-codepipelineaws-codecommitaws-pipeline

Init CodeCommit repository with seed-code stored in S3 using CDK


I'm trying to convert the MLOps template for model building, training, and deployment CloudFormation template into a CDK project so I can easily update the definitions, synth the template and upload it into CloudCatalog in order to be used as a project template in SageMaker Studio.

I'm quite new to CDK though, and I'm having some troubles trying to initialize a CodeCommit repository with the sagemaker pipeline seed-code stored in S3, which was accomplished as follows in the original template :

  'ModelBuildCodeCommitRepository':
    'Type': 'AWS::CodeCommit::Repository'
    'Properties':
      'RepositoryName':
        'Fn::Sub': 'sagemaker-${SageMakerProjectName}-${SageMakerProjectId}-modelbuild'
      'RepositoryDescription':
        'Fn::Sub': 'SageMaker Model building workflow infrastructure as code for the
          Project ${SageMakerProjectName}'
      'Code':
        'S3':
          'Bucket': 'sagemaker-servicecatalog-seedcode-sa-east-1'
          'Key': 'toolchain/model-building-workflow-v1.0.zip'
        'BranchName': 'main'

The CDK API docs does refer to the code parameter in codecommit.Repository as an initialization option, but it's only for local files being compressed and uploaded to S3 and such. That's because it assumes a deployment of the CDK project, but I only want the template generated by cdk synth.

Of course I can always use codecommit.CfnRepository and its code parameter to point into S3, but then I cannot insert it in the codepipeline's stage codepipeline_actions.CodeCommitSourceAction's repository parameter because it expects an IRepository object.

I also want to stick to aws-cdk-lib.aws_codepipeline to grasp the fundamental logic of CloudPipeline (which I'm quite new too) and avoid using the high level aws-cdk-lib.pipelines.

Any ideas on how can I accomplish this?


Solution

  • Construct a Repository without a Code prop. Get an escape hatch reference to its L1 CfnRepository layer. Set the CfnRepository's property manually to the existing S3 bucket:

    const repo = new codecommit.Repository(this, 'Repo', { repositoryName: 'my-great-repo' });
    const cfnRepo = repo.node.defaultChild as codecommit.CfnRepository;
    
    cfnRepo.addPropertyOverride('Code', {
      S3: {
        Bucket: 'sagemaker-servicecatalog-seedcode-sa-east-1',
        Key: 'toolchain/model-building-workflow-v1.0.zip',
      },
      BranchName: 'main',
    });
    

    The above code will synth the YAML output in the OP. Pass repo as the pipeline's source action.

    Don't forget to grant the necessary IAM permissions on the S3 bucket.