Search code examples
pythonsetup.py

python -m build including additional folders


I have src-layout package with pyproject.toml and setup.cfg which I'm building use python -m build

It builds and installs fine, but when I open the archive file it includes the contents of a bunch of additional folders that I don't want, i.e.

my project has the following structure

project_root_directory
├── pyproject.toml  # AND/OR setup.cfg, setup.py
├── datasets/
├── model/
├── ...
└── src/
    └── mypkg/
        ├── __init__.py
        ├── ...
        ├── module.py

My setup.cfg is

[options]
packages = find:
package_dir =
    =src
zip_safe = False
install_requires =
    torch==2.0.0
    ...
[options.packages.find]
where = src
include = mypkg

pyproject.toml

[build-system]
requires = ["setuptools>=40.8.0", "wheel", "setuptools_scm[toml]>=6.0"]
build-backend = "setuptools.build_meta"

[tool.setuptools_scm]
write_to = "src/warpspeed_multiclass/_version.py"

setup.py

from setuptools import setup
if __name__ == '__main__':
    setup()

As well as the package, all the files/folders from the project_root_directory are included, i.e. model, data etc. I don't want this, they're large and I'm deploying to sagemaker so I only want the source - the model is loaded from s3 and the data is no longer required (and in general might be sensitive)

I've tried to add exclude to setup.cfg but my attempt failed. How do I ensure I only get the contents of mypkg and the associated metadata in the tar.gz produced by python -m build?


Solution

  • I discovered that setuptools_scm includes all files tracked by the scm (i.e. git in this case). I'm using dvc to run a machine learning pipeline, and it adds hash files in the /data and /model folders in order to track which version was used to train the model. Because these files are added to git, they're also added to the source package by setuptools_scm

    A solution is to exclude them using a MANIFEST.in file.