I've developed some Python packages that I've uploaded on Azure DevOps Artifacts with a DevOps pipeline. It works well, but the pipeline stores on Artifacts not only my packages, but even their dependencies on the setup.cfg file!
They are normal dependencies, pandas and similar, but is it a best practice to store a copy of these libraries on Artifacts? For my logic I would say no... How can I prevent this behaviour?
These are my pipeline and my cfg file:
pipeline
trigger:
tags:
include:
- 'v*.*'
branches:
include:
- main
- dev-release
pool:
vmImage: 'ubuntu-latest'
stages:
- stage: 'Stage_Test'
variables:
- group: UtilsDev
jobs:
- job: 'Job_Test'
steps:
- task: UsePythonVersion@0
inputs:
versionSpec: '$(pythonVersion)'
displayName: 'Use Python $(pythonVersion)'
- script: |
python -m pip install --upgrade pip
displayName: 'Upgrade PIP'
- script: |
pip install pytest pytest-azurepipelines
displayName: 'Install test dependencies'
- script: |
pytest
displayName: 'Execution of PyTest'
- stage: 'Stage_Build'
variables:
- group: UtilsDev
jobs:
- job: 'Job_Build'
steps:
- task: UsePythonVersion@0
inputs:
versionSpec: '$(pythonVersion)'
displayName: 'Use Python $(pythonVersion)'
- script: |
python -m pip install --upgrade pip
displayName: 'Upgrade PIP'
- script: |
pip install build wheel
displayName: 'Install build dependencies'
- script: |
python -m build
displayName: 'Artifact creation'
- publish: '$(System.DefaultWorkingDirectory)'
artifact: package
- stage: 'Stage_Deploy_DEV'
condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/dev-release'))
variables:
- group: UtilsDev
jobs:
- deployment: Build_Deploy
displayName: Build Deploy
environment: [OMIT]-artifacts-dev
strategy:
runOnce:
deploy:
steps:
- download: current
artifact: package
- task: UsePythonVersion@0
inputs:
versionSpec: '$(pythonVersion)'
displayName: 'Use Python $(pythonVersion)'
- script: |
pip install twine
displayName: 'Install build dependencies'
- task: TwineAuthenticate@1
displayName: 'Twine authentication'
inputs:
pythonUploadServiceConnection: 'PythonPackageUploadDEV'
- script: |
python -m twine upload --skip-existing --verbose -r $(feedName) --config-file $(PYPIRC_PATH) dist/*
workingDirectory: '$(Pipeline.Workspace)/package'
displayName: 'Artifact upload'
- stage: 'Stage_Deploy_PROD'
dependsOn: 'Stage_Build'
condition: and(succeeded(), or(eq(variables['Build.SourceBranch'], 'refs/heads/main'), startsWith(variables['Build.SourceBranch'], 'refs/tags/v')))
variables:
- group: UtilsProd
jobs:
- job: 'Approval_PROD_Release'
pool: server
steps:
- task: ManualValidation@0
timeoutInMinutes: 1440 # task times out in 1 day
inputs:
notifyUsers: |
[USER]@[OMIT].com
instructions: 'Please validate the build configuration and resume'
onTimeout: 'resume'
- deployment: Build_Deploy
displayName: Build Deploy
environment: [OMIT]-artifacts-prod
strategy:
runOnce:
deploy:
steps:
- download: current
artifact: package
- task: UsePythonVersion@0
inputs:
versionSpec: '$(pythonVersion)'
displayName: 'Use Python $(pythonVersion)'
- script: |
pip install twine
displayName: 'Install build dependencies'
- task: TwineAuthenticate@1
displayName: 'Twine authentication'
inputs:
pythonUploadServiceConnection: 'PythonPackageUploadPROD'
- script: |
python -m twine upload --skip-existing --verbose -r $(feedName) --config-file $(PYPIRC_PATH) dist/*
workingDirectory: '$(Pipeline.Workspace)/package'
displayName: 'Artifact upload'
setup file
[metadata]
name = [OMIT]_azure
version = 0.2
author = [USER]
author_email = [USER]@[OMIT].com
description = A package containing utilities for interacting with Azure
long_description = file: README.md
long_description_content_type = text/markdown
project_urls =
classifiers =
Programming Language :: Python :: 3
License :: OSI Approved :: MIT License
Operating System :: OS Independent
[options]
package_dir =
= src
packages = find:
python_requires = >=3.7
install_requires =
azure-storage-file-datalake>="12.6.0"
pyspark>="3.2.1"
openpyxl>="3.0.9"
pandas>="1.4.2"
pyarrow>="8.0.0"
fsspec>="2022.3.0"
adlfs>="2022.4.0"
[OMIT]-utils>="0.4"
[options.packages.find]
where = src
I've noticed that the pipeline has this behavior only in the production stage (Stage_Deploy_PROD) and not in the dev-release one (Stage_Deploy_DEV) and that the stored dependencies are much more than the 8 specified in the setup.cfg file.
Has anyone ever dealt with this?
Thanks in advance!!
According to this doc, once you've enabled an upstream source, every time you install a package from the public registry, Azure Artifacts will save a copy of that package in your feed.
One of the reasons why there are more packages in Artifact than in your setup.cfg file is that when you download some packages, the necessary dependencies of these packages will also be downloaded together. Take PySpark as an example, when you download PySpark, since Py4J is required, it will also be downloaded together.
This is my test result, when I only download PySpark in the pipeline, Py4J is also downloaded and a copy is saved to Artifact.