I have src-layout package with pyproject.toml and setup.cfg which I'm building use python -m build
It builds and installs fine, but when I open the archive file it includes the contents of a bunch of additional folders that I don't want, i.e.
my project has the following structure
project_root_directory
├── pyproject.toml # AND/OR setup.cfg, setup.py
├── datasets/
├── model/
├── ...
└── src/
└── mypkg/
├── __init__.py
├── ...
├── module.py
My setup.cfg is
[options]
packages = find:
package_dir =
=src
zip_safe = False
install_requires =
torch==2.0.0
...
[options.packages.find]
where = src
include = mypkg
pyproject.toml
[build-system]
requires = ["setuptools>=40.8.0", "wheel", "setuptools_scm[toml]>=6.0"]
build-backend = "setuptools.build_meta"
[tool.setuptools_scm]
write_to = "src/warpspeed_multiclass/_version.py"
setup.py
from setuptools import setup
if __name__ == '__main__':
setup()
As well as the package, all the files/folders from the project_root_directory
are included, i.e. model, data etc. I don't want this, they're large and I'm deploying to sagemaker so I only want the source - the model is loaded from s3 and the data is no longer required (and in general might be sensitive)
I've tried to add exclude
to setup.cfg
but my attempt failed. How do I ensure I only get the contents of mypkg
and the associated metadata in the tar.gz
produced by python -m build
?
I discovered that setuptools_scm
includes all files tracked by the scm
(i.e. git in this case). I'm using dvc
to run a machine learning pipeline, and it adds hash files in the /data and /model folders in order to track which version was used to train the model. Because these files are added to git, they're also added to the source package by setuptools_scm
A solution is to exclude them using a MANIFEST.in
file.