Search code examples
pythonboilerpipe

pip install boilerpipe failed with tarfile.ReadError: empty file


I'm try to install boilerpipe through pip but it failed.

here is the log.

Complete output from command python setup.py egg_info:

Traceback (most recent call last):
  File "<string>", line 20, in <module>
  File "/tmp/pip-build-J2gFYC/boilerpipe/setup.py", line 27, in <module>
    download_jars(datapath=DATAPATH)
  File "/tmp/pip-build-J2gFYC/boilerpipe/setup.py", line 21, in download_jars
    tar = tarfile.open(tgz_name, mode='r:gz')
  File "/usr/lib/python2.7/tarfile.py", line 1678, in open
    return func(name, filemode, fileobj, **kwargs)
  File "/usr/lib/python2.7/tarfile.py", line 1727, in gzopen
    **kwargs)
  File "/usr/lib/python2.7/tarfile.py", line 1705, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/usr/lib/python2.7/tarfile.py", line 1574, in __init__
    self.firstmember = self.next()
  File "/usr/lib/python2.7/tarfile.py", line 2334, in next
    raise ReadError("empty file")
tarfile.ReadError: empty file

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-J2gFYC/boilerpipe


Solution

  • Sometimes the URL from where the GZ file should be downloaded returns 404. The best way to install boilerpipe in such case is:

    • git clone https://github.com/ptwobrussell/python-boilerpipe.git
    • open setup.py
    • Find the line where the link is defined. Usually, the line looks like tgz_url = 'https://boilerpipe.googlecode.com/files/boilerpipe-{0}-bin.tar.gz'.format(version)
    • Find a valid download link from https://code.google.com/archive/p/boilerpipe/downloads
    • Change it to include a working URL tgz_url='https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/boilerpipe/boilerpipe-1.2.0-bin.tar.gz'