Search code examples
amazon-web-servicesaws-samaws-sam-cli

AWS sam build -u gets stuck


I have a Gitlab pipeline that builds and deploys my SAM application. My application contains about 30 lambdas with mostly python and some node. I have never had an issue when I run sam build -u locally. But when running the command in my pipeline, the pipeline hangs on the last function and gets stuck on "Mounting /builds/<rest_of_path>/ /tmp/samcli/source:ro,delegated, inside runtime container"

In order to resolve this, I have to delete all artifacts in my repo and then clear runner caches. Then the pipeline will work once with sam build, and then get stuck again on subsequent runs. I've tried modifying my gitlab-ci.yml in many ways with no success.

Any ideas on what I can try to get sam build to execute consistently?

Here is my gitlab-ci.yml

variables:
  SAM_TEMPLATE: Lambdas/template.yaml
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"

services:
  - docker:23.0.6-dind

# Should always specify a specific version of the image. If using a tag like docker:stable,
# there will be no control over which version is used. Unpredictable behavior can result.
image: docker:23.0.6

before_script:
  - apk add --update python3 py-pip python3-dev build-base libffi-dev
  - pip install --upgrade pip
  - pip install awscli aws-sam-cli
  
stages:
  - preview
  - deploy 

preview:
  stage: preview
  script:
    - chmod 755 aws-variables.sh
    - ./aws-variables.sh
    - export AWS_DEFAULT_REGION=$AWS_REGION
    - cd Lambdas
    - sam build -u
    - sam deploy --region $AWS_REGION --no-execute-changeset --no-fail-on-empty-changeset
    - cd ..
    - changeset_id=$(aws cloudformation describe-change-set --stack-name Lambdas --change-set-name $(aws cloudformation list-change-sets --stack-name Lambdas --query "sort_by(Summaries, &CreationTime)[-1].ChangeSetName" --output text) --query "ChangeSetId" --output text)
    - echo $changeset_id > changeset.txt
  artifacts:
    paths:
      - changeset.txt

deploy-prod:
  stage: deploy
  script:
    - chmod 755 aws-variables.sh
    - ./aws-variables.sh
    - changeset_id=$(cat changeset.txt)
    - export AWS_DEFAULT_REGION=$AWS_REGION
    - aws cloudformation execute-change-set --change-set-name $changeset_id
  only:
    - main
    - develop
  when: manual
  environment:
    name: production

Here is some of the end output from sam build -u --debug

pip stderr: b"WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.\nPlease see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.\nTo avoid this problem you can invoke Python with '-m pip' instead of running pip directly.\n\n[notice] A new release of pip is available: 23.0.1 -> 24.2\n[notice] To update, run: pip install --upgrade pip\n"
Full dependency closure: {jmespath==1.0.1(wheel), python-dateutil==2.9.0.post0(wheel), six==1.16.0(wheel), boto3==1.35.2(wheel), botocore==1.35.2(wheel), s3transfer==0.10.2(wheel), aws-psycopg2==1.3.8(wheel), urllib3==1.26.19(wheel)}
initial compatible: {jmespath==1.0.1(wheel), python-dateutil==2.9.0.post0(wheel), boto3==1.35.2(wheel), six==1.16.0(wheel), botocore==1.35.2(wheel), s3transfer==0.10.2(wheel), aws-psycopg2==1.3.8(wheel), urllib3==1.26.19(wheel)}
initial incompatible: set()
Downloading missing wheels: set()
compatible wheels after second download pass: {jmespath==1.0.1(wheel), python-dateutil==2.9.0.post0(wheel), six==1.16.0(wheel), boto3==1.35.2(wheel), botocore==1.35.2(wheel), s3transfer==0.10.2(wheel), aws-psycopg2==1.3.8(wheel), urllib3==1.26.19(wheel)}
Build missing wheels from sdists (C compiling True): set()
compatible after building wheels (no C compiling): {jmespath==1.0.1(wheel), python-dateutil==2.9.0.post0(wheel), six==1.16.0(wheel), boto3==1.35.2(wheel), botocore==1.35.2(wheel), s3transfer==0.10.2(wheel), aws-psycopg2==1.3.8(wheel), urllib3==1.26.19(wheel)}
Build missing wheels from sdists (C compiling False): set()
compatible after building wheels (C compiling): {jmespath==1.0.1(wheel), python-dateutil==2.9.0.post0(wheel), six==1.16.0(wheel), boto3==1.35.2(wheel), botocore==1.35.2(wheel), s3transfer==0.10.2(wheel), aws-psycopg2==1.3.8(wheel), urllib3==1.26.19(wheel)}
Final compatible: {jmespath==1.0.1(wheel), python-dateutil==2.9.0.post0(wheel), six==1.16.0(wheel), boto3==1.35.2(wheel), botocore==1.35.2(wheel), s3transfer==0.10.2(wheel), aws-psycopg2==1.3.8(wheel), urllib3==1.26.19(wheel)}
Final incompatible: set()
Final missing wheels: set()
PythonPipBuilder:ResolveDependencies succeeded
 Running PythonPipBuilder:CopySource
Copying source file (/tmp/samcli/source/requirements.txt) to destination (/tmp/samcli/artifacts/requirements.txt)
Copying source file (/tmp/samcli/source/lambda_function.py) to destination (/tmp/samcli/artifacts/lambda_function.py)
PythonPipBuilder:CopySource succeeded
Full dependency closure: {jmespath==1.0.1(wheel), boto3==1.35.2(wheel), python-dateutil==2.9.0.post0(wheel), s3transfer==0.10.2(wheel), botocore==1.35.2(wheel), aws-psycopg2==1.3.8(wheel), urllib3==1.26.19(wheel), six==1.16.0(wheel)}
initial compatible: {jmespath==1.0.1(wheel), boto3==1.35.2(wheel), python-dateutil==2.9.0.post0(wheel), s3transfer==0.10.2(wheel), botocore==1.35.2(wheel), aws-psycopg2==1.3.8(wheel), urllib3==1.26.19(wheel), six==1.16.0(wheel)}
initial incompatible: set()
Downloading missing wheels: set()
compatible wheels after second download pass: {jmespath==1.0.1(wheel), boto3==1.35.2(wheel), python-dateutil==2.9.0.post0(wheel), s3transfer==0.10.2(wheel), botocore==1.35.2(wheel), aws-psycopg2==1.3.8(wheel), urllib3==1.26.19(wheel), six==1.16.0(wheel)}
Build missing wheels from sdists (C compiling True): set()
compatible after building wheels (no C compiling): {jmespath==1.0.1(wheel), boto3==1.35.2(wheel), python-dateutil==2.9.0.post0(wheel), s3transfer==0.10.2(wheel), botocore==1.35.2(wheel), aws-psycopg2==1.3.8(wheel), urllib3==1.26.19(wheel), six==1.16.0(wheel)}
Build missing wheels from sdists (C compiling False): set()
compatible after building wheels (C compiling): {jmespath==1.0.1(wheel), boto3==1.35.2(wheel), python-dateutil==2.9.0.post0(wheel), s3transfer==0.10.2(wheel), botocore==1.35.2(wheel), aws-psycopg2==1.3.8(wheel), urllib3==1.26.19(wheel), six==1.16.0(wheel)}
Final compatible: {jmespath==1.0.1(wheel), boto3==1.35.2(wheel), python-dateutil==2.9.0.post0(wheel), s3transfer==0.10.2(wheel), botocore==1.35.2(wheel), aws-psycopg2==1.3.8(wheel), urllib3==1.26.19(wheel), six==1.16.0(wheel)}
Final incompatible: set()
Final missing wheels: set()
PythonPipBuilder:ResolveDependencies succeeded
 Running PythonPipBuilder:CopySource
Copying source file (/tmp/samcli/source/requirements.txt) to destination (/tmp/samcli/artifacts/requirements.txt)
Copying source file (/tmp/samcli/source/lambda_function.py) to destination (/tmp/samcli/artifacts/lambda_function.py)
2024-08-20 22:54:08,878 | Build inside container returned response {"jsonrpc": "2.0", "id": 1, "result": {"artifacts_dir": "/tmp/samcli/artifacts"}}
2024-08-20 22:54:08,878 | Build inside container was successful. Copying artifacts from container to host
PythonPipBuilder:CopySource succeeded
2024-08-20 22:54:08,881 | Build inside container returned response {"jsonrpc": "2.0", "id": 1, "result": {"artifacts_dir": "/tmp/samcli/artifacts"}}
2024-08-20 22:54:08,882 | Build inside container was successful. Copying artifacts from container to host
2024-08-20 22:54:16,214 | Copying from container: /tmp/samcli/artifacts/. -> /builds/nroc/aws-lambdas/Lambdas/.aws-sam/build/DBRotatePasswordDevelopment
2024-08-20 22:54:16,234 | Copying from container: /tmp/samcli/artifacts/. -> /builds/nroc/aws-lambdas/Lambdas/.aws-sam/build/DBRotatePasswordProduction
2024-08-20 22:54:20,172 | Build inside container succeeded
2024-08-20 22:54:20,188 | Build inside container succeeded
Terminated
WARNING: step_script could not run to completion because the timeout was exceeded. For more control over job and script timeouts see: https://docs.gitlab.com/ee/ci/runners/configure_runners.html#set-script-and-after_script-timeouts
ERROR: Job failed: execution took longer than 1h0m0s seconds

Solution

  • I opened a bug on the aws-sam-cli GitHub issues board and was told that this may be a bug and that they are working on a fix. This is what they told me:

    Looks like I was able to reproduce the issue, this is likely because one of the container didn't return to the main thread after finishing the build. We are working on the fix and hopefully it could be available in the next release.

    In the meantime, if you could install Python (with the same version as defined in template.yaml) to the host and run sam build without -u, I believe it will help you unblock the pipeline.

    Link to issue

    I will update if this does indeed resolve the issue if they release a fix

    Update 08/23/2024

    I ended up changing all my python lambdas to a consistent version and installed python and node on my container. I also removed the -u flag. These were the added benefits:

    • Consistently successful builds
    • Faster build time

    I may check one day to see if the team ever actually fixes the bug reported but I doubt I would ever switch back to using the flag as this is quite convenient.

    The only benefit I can see using the flag is that you wouldn't need to install your runtime versions explicitly and maintain your pipeline as much. That is probably what contributes to the increased run time though.

    Here is how I modified my pipeline file:

    variables:
      SAM_TEMPLATE: Lambdas/template.yaml
      DOCKER_DRIVER: overlay2
      DOCKER_TLS_CERTDIR: "/certs"
    
    services:
      - docker:23.0.6-dind
    
    image: docker:23.0.6
    
    before_script:
      - apk add --update python3 py3-pip python3-dev build-base libffi-dev util-linux procps
      - apk add nodejs npm
      - if ! python3 --version | grep -q "3.11"; then apk add --repository=http://dl-cdn.alpinelinux.org/alpine/edge/community python3=3.11*; fi
      - ln -sf /usr/bin/python3 /usr/local/bin/python
      - ln -sf /usr/bin/node /usr/local/bin/node
      - pip install --upgrade pip
      - pip install awscli aws-sam-cli
      
    stages:
      - preview
      - deploy 
    
    preview:
      stage: preview
      timeout: 20m
      script:
        - chmod 755 aws-variables.sh
        - ./aws-variables.sh
        - export AWS_DEFAULT_REGION=$AWS_REGION
        - cd Lambdas
        - sam build
        - sam deploy --region $AWS_REGION --no-execute-changeset --no-fail-on-empty-changeset
        - cd ..
        - changeset_id=$(aws cloudformation describe-change-set --stack-name Lambdas --change-set-name $(aws cloudformation list-change-sets --stack-name Lambdas --query "sort_by(Summaries, &CreationTime)[-1].ChangeSetName" --output text) --query "ChangeSetId" --output text)
        - echo $changeset_id > changeset.txt
      artifacts:
        paths:
          - changeset.txt
    
    deploy-prod:
      stage: deploy
      script:
        - chmod 755 aws-variables.sh
        - ./aws-variables.sh
        - changeset_id=$(cat changeset.txt)
        - export AWS_DEFAULT_REGION=$AWS_REGION
        - aws cloudformation execute-change-set --change-set-name $changeset_id
      only:
        - main
        - develop
      when: manual
      environment:
        name: production