Search code examples
pythonazureazure-devopspython-packagingazure-artifacts

Best practice for Python package on Azure Artifacts feed


I've developed some Python packages that I've uploaded on Azure DevOps Artifacts with a DevOps pipeline. It works well, but the pipeline stores on Artifacts not only my packages, but even their dependencies on the setup.cfg file!

They are normal dependencies, pandas and similar, but is it a best practice to store a copy of these libraries on Artifacts? For my logic I would say no... How can I prevent this behaviour?

These are my pipeline and my cfg file:

pipeline

trigger:
  tags:
    include:
      - 'v*.*'
  branches:
    include: 
    - main
    - dev-release

pool:
  vmImage: 'ubuntu-latest'

stages:
  - stage: 'Stage_Test'
    variables:
    - group: UtilsDev
    jobs:
    - job: 'Job_Test'
      steps:
      - task: UsePythonVersion@0
        inputs:
          versionSpec: '$(pythonVersion)'
        displayName: 'Use Python $(pythonVersion)'

      - script: |
          python -m pip install --upgrade pip
        displayName: 'Upgrade PIP'

      - script: |
          pip install pytest pytest-azurepipelines
        displayName: 'Install test dependencies'

      - script: |
          pytest
        displayName: 'Execution of PyTest'

  - stage: 'Stage_Build'
    variables:
    - group: UtilsDev
    jobs:
    - job: 'Job_Build'
      steps:
        - task: UsePythonVersion@0
          inputs:
            versionSpec: '$(pythonVersion)'
          displayName: 'Use Python $(pythonVersion)'

        - script: |
            python -m pip install --upgrade pip
          displayName: 'Upgrade PIP'

        - script: |
            pip install build wheel
          displayName: 'Install build dependencies'

        - script: |
            python -m build
          displayName: 'Artifact creation'

        - publish: '$(System.DefaultWorkingDirectory)'
          artifact: package

  - stage: 'Stage_Deploy_DEV'
    condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/dev-release'))
    variables:
    - group: UtilsDev
    jobs:
    - deployment: Build_Deploy
      displayName: Build Deploy
      environment: [OMIT]-artifacts-dev
      strategy:
        runOnce:
          deploy:
            steps:
            - download: current
              artifact: package

            - task: UsePythonVersion@0
              inputs:
                versionSpec: '$(pythonVersion)'
              displayName: 'Use Python $(pythonVersion)'

            - script: |
                pip install twine
              displayName: 'Install build dependencies'

            - task: TwineAuthenticate@1
              displayName: 'Twine authentication'
              inputs:
                pythonUploadServiceConnection: 'PythonPackageUploadDEV'

            - script: |
                python -m twine upload --skip-existing --verbose -r $(feedName) --config-file  $(PYPIRC_PATH) dist/*
              workingDirectory: '$(Pipeline.Workspace)/package'              
              displayName: 'Artifact upload'

  - stage: 'Stage_Deploy_PROD'
    dependsOn: 'Stage_Build'
    condition: and(succeeded(), or(eq(variables['Build.SourceBranch'], 'refs/heads/main'), startsWith(variables['Build.SourceBranch'], 'refs/tags/v')))
    variables:
    - group: UtilsProd
    jobs:
    - job: 'Approval_PROD_Release'
      pool: server
      steps:
      - task: ManualValidation@0
        timeoutInMinutes: 1440 # task times out in 1 day
        inputs:
          notifyUsers: |
            [USER]@[OMIT].com
          instructions: 'Please validate the build configuration and resume'
          onTimeout: 'resume'
    - deployment: Build_Deploy
      displayName: Build Deploy
      environment: [OMIT]-artifacts-prod
      strategy:
        runOnce:
          deploy:
            steps:
            - download: current
              artifact: package

            - task: UsePythonVersion@0
              inputs:
                versionSpec: '$(pythonVersion)'
              displayName: 'Use Python $(pythonVersion)'

            - script: |
                pip install twine
              displayName: 'Install build dependencies'

            - task: TwineAuthenticate@1
              displayName: 'Twine authentication'
              inputs:
                pythonUploadServiceConnection: 'PythonPackageUploadPROD'

            - script: |
                python -m twine upload --skip-existing --verbose -r $(feedName) --config-file  $(PYPIRC_PATH) dist/*
              workingDirectory: '$(Pipeline.Workspace)/package'    
              displayName: 'Artifact upload'

setup file

[metadata]
name = [OMIT]_azure
version = 0.2
author = [USER]
author_email = [USER]@[OMIT].com
description = A package containing utilities for interacting with Azure
long_description = file: README.md
long_description_content_type = text/markdown
project_urls =
classifiers =
    Programming Language :: Python :: 3
    License :: OSI Approved :: MIT License
    Operating System :: OS Independent

[options]
package_dir =
    = src
packages = find:
python_requires = >=3.7
install_requires =
    azure-storage-file-datalake>="12.6.0"
    pyspark>="3.2.1"
    openpyxl>="3.0.9"
    pandas>="1.4.2"
    pyarrow>="8.0.0"
    fsspec>="2022.3.0"
    adlfs>="2022.4.0"
    [OMIT]-utils>="0.4"

[options.packages.find]
where = src

I've noticed that the pipeline has this behavior only in the production stage (Stage_Deploy_PROD) and not in the dev-release one (Stage_Deploy_DEV) and that the stored dependencies are much more than the 8 specified in the setup.cfg file.

Has anyone ever dealt with this?

Thanks in advance!!


Solution

  • According to this doc, once you've enabled an upstream source, every time you install a package from the public registry, Azure Artifacts will save a copy of that package in your feed.

    One of the reasons why there are more packages in Artifact than in your setup.cfg file is that when you download some packages, the necessary dependencies of these packages will also be downloaded together. Take PySpark as an example, when you download PySpark, since Py4J is required, it will also be downloaded together. enter image description here

    This is my test result, when I only download PySpark in the pipeline, Py4J is also downloaded and a copy is saved to Artifact. enter image description here