Search code examples
pythonpippackagepypi

How to get the short and long description of a not installed pip package?


To my great disappointment the pip package manager does not show any information for packages not already installed. The only way to get anything, seem to be to grep the output of the short description with pip search XXX |grep -i XXX.

  • Q: Is there an easy way to get the long description for a pip package XXX?
    (From command line and without having to install it.)

Perhaps a smart way of using wget or curl from PyPI could work?


EDIT: I managed to get a curl one-liner with:

Here's Bash one-liner:

curl -sG -H 'Host: pypi.org' -H 'Accept: application/json' https://pypi.org/pypi/numpy/json | awk -F "description\":\"" '{ print $2 }' |cut -d ',' -f 1

# NumPy is a general-purpose array-processing package designed to...

However, a different and more robust way would be preferable.


Solution

  • PyPI offers an API to access package metadata:

    • Simple: a response from https://pypi.org/simple/<pkgname> is an HTML page that is a list of download URLs and can be parsed with any HTML parser, like beautifulsoup or lxml.

    • JSON: a response from http://pypi.org/pypi/<pkgname>/json is a JSON string that can be processed using any JSON processing tool. Example from comments using requests:

        In [1]: import requests
    
        In [2]: data = requests.get('https://pypi.org/pypi/lxml/json').json()
    
        In [3]: data['info']['summary']
        Out[3]: 'Powerful and Pythonic XML processing library combining libxml2/libxslt with the ElementTree API.'
    
        In [4]: data['info']['description']
        Out[4]: 'lxml is a Pythonic, mature binding for the libxml2 and libxslt libraries.  It\nprovides safe and convenient access to these libraries using the ElementTree\nAPI.\n\nIt extends the ElementTree API significantly to offer support for XPath,\nRelaxNG, XML Schema, XSLT, C14N and much more.\n\nTo contact the project, go to the `project home page\n<http://lxml.de/>`_ or see our bug tracker at\nhttps://launchpad.net/lxml\n\nIn case you want to use the current in-development version of lxml,\nyou can get it from the github repository at\nhttps://github.com/lxml/lxml .  Note that this requires Cython to\nbuild the sources, see the build instructions on the project home\npage.  To the same end, running ``easy_install lxml==dev`` will\ninstall lxml from\nhttps://github.com/lxml/lxml/tarball/master#egg=lxml-dev if you have\nan appropriate version of Cython installed.\n\n\nAfter an official release of a new stable series, bug fixes may become\navailable at\nhttps://github.com/lxml/lxml/tree/lxml-4.2 .\nRunning ``easy_install lxml==4.2bugfix`` will install\nthe unreleased branch state from\nhttps://github.com/lxml/lxml/tarball/lxml-4.2#egg=lxml-4.2bugfix\nas soon as a maintenance branch has been established.  Note that this\nrequires Cython to be installed at an appropriate version for the build.\n\n4.2.5 (2018-09-09)\n==================\n\nBugs fixed\n----------\n\n* Javascript URLs that used URL escaping were not removed by the HTML cleaner.\n  Security problem found by Omar Eissa.\n\n\n\n\n'
    

    A command line alternative would be using yolk. Install with

    $ pip install yolk3k
    

    Above query of lxml for summary and description with yolk:

    $ yolk -M lxml -f summary,description
    summary: Powerful and Pythonic XML processing library combining libxml2/libxslt with the ElementTree API.
    description: lxml is a Pythonic, mature binding for the libxml2 and libxslt libraries.  It
    provides safe and convenient access to these libraries using the ElementTree
    API.
    
    It extends the ElementTree API significantly to offer support for XPath,
    RelaxNG, XML Schema, XSLT, C14N and much more.
    
    ...