Search code examples
pythonpippackage

"ModuleNotFoundError" after installing a python package


Problem summary

I am very new to python package development. I developed a package and published it at TestPyPI. I install this package trough pip with no errors. However, python is giving me a "ModuleNotFoundError" when I try to import it, and I have no idea why. Can someone help me?

Repro steps

First, I install the package with:

pip install -i https://test.pypi.org/simple/ spark-map==0.2.76

Then, I open a new terminal, start the python interpreter, and try to import this package, but python gives me a ModuleNotFoundError:

>>> import spark_map
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'spark_map'

What I discover

  • When I cd to the root folder of the package, and open the python interpreter, and run import spark_map, it works fine with no errors;

  • That pip did not installed the package succesfully; However I checked this. I got no error messages when I install the package, and when I run pip list after the pip install command, I see spark_map on the list of installed packages.

> pip list
... many packages
spark-map                0.2.76
... more packages
  • The folder where spark_map was installed can be out of the module search path of Python; I checked this as well. pip is installing the package on a folder called Python310\lib\site-packages, and this folder is included inside the sys.path variable:
>>> import sys
>>> for path in sys.path:
...     print(path)

C:\Users\pedro\AppData\Local\Programs\Python\Python310\python310.zip
C:\Users\pedro\AppData\Local\Programs\Python\Python310\DLLs
C:\Users\pedro\AppData\Local\Programs\Python\Python310\lib
C:\Users\pedro\AppData\Local\Programs\Python\Python310
C:\Users\pedro\AppData\Local\Programs\Python\Python310\lib\site-packages
C:\Users\pedro\AppData\Local\Programs\Python\Python310\lib\site-packages\win32
C:\Users\pedro\AppData\Local\Programs\Python\Python310\lib\site-packages\win32\lib
C:\Users\pedro\AppData\Local\Programs\Python\Python310\lib\site-packages\Pythonwin

Information about the system

I am on Windows 10, Python 3.10.9, trying to install and import the spark_map package, version 0.2.76.(https://test.pypi.org/project/spark-map/).

Information about the code

The package source code is hosted at GitHub, and the folder structure of this package is essentially this:

root
│
├───spark_map
│   ├───__init__.py
│   ├───functions.py
│   └───mapping.py
│
├───tests
│   ├───functions
│   └───mapping
│
├───.gitignore
├───LICENSE
├───pyproject.toml
├───README.md
└───README.rst

The pyproject.toml file of the package:

[build-system]
requires = ["setuptools>=61.0", "toml"]
build-backend = "setuptools.build_meta"

[project]
name = "spark_map"
version = "0.2.76"
authors = [
  { name="Pedro Faria", email="[email protected]" }
]
description = "Pyspark implementation of `map()` function for spark DataFrames"
readme = "README.md"
requires-python = ">=3.7"
license = { file = "LICENSE.txt" }
classifiers = [
    "Programming Language :: Python :: 3",
    "License :: OSI Approved :: MIT License",
    "Operating System :: OS Independent",
]

dependencies = [
    "pyspark",
    "setuptools",
    "toml"
]

[project.urls]
Homepage = "https://pedropark99.github.io/spark_map/"
Repo = "https://github.com/pedropark99/spark_map"
Issues = "https://github.com/pedropark99/spark_map/issues"


[tool.pytest.ini_options]
pythonpath = [
  "."
]


[tool.setuptools]
py-modules = []

What I tried

As @Dorian Turba suggested, I moved the source code into a src folder. Now, the structure of the package is this:

root
├───src
│   └───spark_map
│       ├───__init__.py
│       ├───functions.py
│       └───mapping.py
│
├───tests
├───.gitignore
├───LICENSE
├───pyproject.toml
├───README.md
└───README.rst

After that, I executed python -m pip install -e . (the log of this command is on the image below). The package was compiled and installed succesfully. However, when I open a new terminal, in a different location, and try to run python -c "import spark_map", I still get the same error.

enter image description here

I also tried to start a virtual environment (with python -m venv env), and install the package inside this virtual environment (with pip install -e .). Then, I executed python -c "import spark_map". But the problem still remains. I executed pip list too, to check if the package was installed. The full log of commands is on the image below:

enter image description here


Solution

  • The source of the problem

    The source of the problem is at the "build process" of the package. In other words, pip install was installing a "not valid package".

    Basically, I use setuptools to build the package. When I compiled (or "build" the package with python -m build, the source code of the package (that is, all contents of the src directory), was not included in the compiled TAR archive.

    Fix using setuptools

    The documentation for setuptools talks about this issue of finding the source code for your project. In essence, setuptools was not finding the source code of the package. So I needed to help him find these files, by adding these two options to my pyproject.toml file:

    [tool.setuptools]
    packages = ["spark_map"]
    package-dir = {"" = "src"}
    

    How can you identify this problem ?

    If you are having a similar problem at installing and importing your package, you might have this same problem, as I did. To check if that is your case, build your project with python -m build. Then, open the source distribution of your package (that is, the TAR archive), and check if the source code is there, inside this TAR file. If not, than, you have this exact problem.