Search code examples
pythonwebkitpyqt4frameqtwebkit

PyQt QWebKit frame bug?


I'm using Python, PyQt4, and QtWebKit to load a web page into a bare-bones browser to examine the data.

However, there is a small issue. I'm trying to get the contents and src of every iframe on the loaded page. I'm using webView.page().mainFrame().childFrames() to get the frames. To problem is, childFrames() loads the frames ONLY if they're visible by the browser. For example, when your browser is positioned at the top of the page, childFrames() will not load the iframes are at the footer of the page. Is there a way or setting I could tweak where I can get all ads? I've attached the source of my "browser". Try scrolling down when the page finishes it's loading. Watch the console and you will see that the iframes load dynamically. Please help.

from PyQt4 import QtGui, QtCore, QtWebKit
import sys
import unicodedata


class Sp():
    def Main(self):
        self.webView = QtWebKit.QWebView()
        self.webView.load(QtCore.QUrl("http://www.msnbc.msn.com/id/41197838/ns/us_news-environment/"))
        self.webView.show()
        QtCore.QObject.connect(self.webView,QtCore.SIGNAL("loadFinished(bool)"),self.Load)


def Load(self):
    frame = self.webView.page().mainFrame()
    children = frame.childFrames()
    fT = []


    for x in children:
        print "=========================================="
        print unicodedata.normalize('NFKD', unicode(x.url().toString())).encode('ascii','ignore')
        print "=========================================="
        fT.append([unicode(x.url().toString()),unicode(x.toHtml()),[]])


    for x in range(len(fT)):
        f = children[x]
        tl = []
        for fx in f.childFrames():
            print "___________________________________________"
            print unicodedata.normalize('NFKD', unicode(fx.url().toString())).encode('ascii','ignore')
            print "___________________________________________"
            tl.append([unicode(fx.url().toString()),unicode(fx.toHtml()),[]])
        fT[x][2] = tl


app = QtGui.QApplication(sys.argv)
s = Sp()
s.Main()
app.exec_()

Solution

  • Not sure why you're doing what you're doing, but if it's only loading what's visible, you can set the page viewport size to the content size and that should load everything:

    def Load(self):
        self.webView.page().setViewportSize(
            self.webView.page().mainFrame().contentsSize())
    

    However, this has a weird effect in the GUI so this solution may be unacceptable for what you are trying to do.