I have a Python program that is sort of a wrapper around pip
that I use to assist with development of Python packages. Basically the problem I face is how to read the metadata such as Name and Version of a package (generally '.tar.gz' and '.whl' archives) without installation. Can distutils
or some other tool do this?
Just a few notes... The code is written for Python 3, but I am working with all sorts of Python packages such as sdist, bdist_wheel for both Py2 and Py3. Also I'm only concerned with local packages that I have the path to, not theoretical packages available on PyPi.
What I'm doing now works just fine, but it seems pretty messy and I'm wondering if there is a better tool that can abstract this. Right now I am reading the metadata text file within the archive and manually parsing out the fields I need. If that fails, I am stripping the name and version out of the package's file name (really terrible) . Is there a better way to do this? Here are the two functions that I am using to parse package Name and Version.
Simeon, thank you for the suggestion to use the metadata.json file contained within wheel archives. I'm not familiar with all of the files contained within archives but I had hoped there was a nice way to parse some of them. metadata.json certainly meets that criteria for wheels. I'm going to leave the question open for a little longer just to see if there are any other suggestions before accepting.
Anyways, in case anyone encounters this issue in the future, I've attached my updated code. It can probably be illustrated cleaner as a class, but this is what I have for now. It isn't super ruggedized for edge cases, so buyer beware and all that.
import tarfile, zipfile
def getmetapath(afo):
"""
Return path to the metadata file within a tarfile or zipfile object.
tarfile: PKG-INFO
zipfile: metadata.json
"""
if isinstance(afo, tarfile.TarFile):
pkgname = afo.fileobj.name
for path in afo.getnames():
if path.endswith('/PKG-INFO'):
return path
elif isinstance(afo, zipfile.ZipFile):
pkgname = afo.filename
for path in afo.namelist():
if path.endswith('.dist-info/metadata.json'):
return path
try:
raise AttributeError("Unable to identify metadata file for '{0}'".format(pkgname))
except NameError:
raise AttributeError("Unable to identify archive's metadata file")
def getmetafield(pkgpath, field):
"""
Return the value of a field from package metadata file.
Whenever possible, version fields are returned as a version object.
i.e. getmetafield('/path/to/archive-0.3.tar.gz', 'name') ==> 'archive'
"""
wrapper = str
if field.casefold() == 'version':
try:
# attempt to use version object (able to perform comparisons)
from distutils.version import LooseVersion as wrapper
except ImportError:
pass
# package is a tar archive
if pkgpath.endswith('.tar.gz'):
with tarfile.open(pkgpath) as tfo:
with tfo.extractfile(getmetapath(tfo)) as mfo:
metalines = mfo.read().decode().splitlines()
for line in metalines:
if line.startswith(field.capitalize() + ': '):
return wrapper(line.split(': ')[-1])
# package is a wheel (zip) archive
elif pkgpath.endswith('.whl'):
import json
with zipfile.ZipFile(pkgpath) as zfo:
metadata = json.loads(zfo.read(getmetapath(zfo)).decode())
try:
return wrapper(metadata[field.lower()])
except KeyError:
pass
raise Exception("Unable to extract field '{0}' from package '{1}'". \
format(field, pkgpath))
The situation for this is not great and that's why wheel files were created. If you only had to support wheel files then you could clean up the code but your approach will remain a bit messy as long as you have to support *.tar.gz
source packages.
The file format of wheels is specified in PEP 427 so you can both parse the filename for certain information and read the contents of the <package>-<version>.dist-info
directory inside. In particular metadata.json
and METADATA
are very useful. In fact, reading metadata.json
would be sufficient and this would lead to clean code to access that information without installing.
I would refactor the code to work with metadata.json
and implement a best-effort approach for PKG-INFO
of source packages. Long-term plan would be to convert all your tar.gz
source packages to wheels and remove the then outdated code for PKG-INFO
parsing.