Search code examples
pythonpyqtweb-crawlerghost.py

screen scraping using Ghost.py


Here is the simple program which does not work

from ghost import Ghost

ghost = Ghost(wait_timeout=40)
page, extra_resources = ghost.open("http://samsung.com/in/consumer/mobile-phone/mobile-phone/smartphone/")
ghost.wait_page_loaded()
n=2;
links=ghost.evaluate("alist=document.getElementsByTagName('a');alist")
print links

ERROR IS: raise Exception(timeout_message)

Exception: Unable to load requested page

iS there some problem with the program?


Solution

  • Seem like people are reporting similar issues to yours, without really getting any explanation (for example: https://github.com/jeanphix/Ghost.py/issues/26)

    Adjust the evaluate line to the following, which is referenced by a ghost.py documentation:

    links = gh.evaluate("""
                            var links = document.querySelectorAll("a");
                            var listRet = [];
                            for (var i=0; i<links.length; i++){
                                listRet.push(links[i].href);
                            }
                            listRet;
                        """)