Search code examples
pythonqtsessionpythonanywheredryscrape

Dryscrape session can't load any site


I've installed dryscrape at pythonanywhere.com. Yet session var cannot load any site, why?

import dryscrape
# as in demo: http://dryscrape.readthedocs.io/en/latest/usage.html#first-demonstration
dryscrape.start_xvfb() 

sess = dryscrape.Session()
sess.visit('https://www.pythonanywhere.com/')

Result error:

sess.visit('https://www.pythonanywhere.com/')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/igorsavinkin/.local/lib/python3.5/site-packages/dryscrape/session.py", line 33, in visit
    return self.driver.visit(self.complete_url(url))
  File "/home/igorsavinkin/.local/lib/python3.5/site-packages/webkit_server.py", line 235, in visit
    self.conn.issue_command("Visit", url)
  File "/home/igorsavinkin/.local/lib/python3.5/site-packages/webkit_server.py", line 520, in issue_command
    return self._read_response()
  File "/home/igorsavinkin/.local/lib/python3.5/site-packages/webkit_server.py", line 530, in _read_response
    raise InvalidResponseError(msg)
webkit_server.InvalidResponseError: {"class":"InvalidResponseError","message":"Unable to load URL: https://www.pythonanywhere.com/ because 
of error loading https://www.pythonanywhere.com/: Unknown error"}

Regardless what site I make session visit from whitelisted the issue is the same.

I've read about dryscrape installation prerequisits:

Before installing dryscrape, you need to install some software it depends on:

  • Qt, QtWebKit
  • lxml
  • pip
  • xvfb_ (necessary only if no other X server is available)

So, neither Qt nor QtWebKit are among pythoneverywhere's default modules...

When I tried to install it, the result is an error (the same with QtWebKit)

$ pip install --user Qt
Collecting Qt
  Could not find a version that satisfies the requirement Qt (from versions: )
No matching distribution found for Qt

The dryscrape setup file, setup.py:

from distutils.core import setup, Command

setup(name='dryscrape',
      version='0.9.1',
      description='a lightweight Javascript-aware, headless web scraping library for Python',
      author='Niklas Baumstark',
      author_email='[email protected]',
      license='MIT',
      url='https://niklasb.github.com/dryscrape',
      packages=['dryscrape', 'dryscrape.driver'],
      requires=['webkit_server', 'lxml'],
      )

Any help is appreciable...


Solution

  • PythonAnywhere dev here -- unfortunately dryscrape depends on WebKit, and WebKit doesn't work with our virtualisation system. If you need to do web-scraping using a browser that can handle JavaScript, you can use selenium and Firefox -- there's more information on our blog. Be warned, though, that we only have Firefox version 17 -- more recent problems have the same issues as WebKit.