I have an application written in Python using GTK3 through the GObject introspection (Python 2.7 and PyGObject 3.14). I am trying to load a web page using WebKit and access the contents of all the resources it loads. I'm able to accomplish this by connecting to the resource-load-finished signal of the WebKitWebView object I am using to load the page.
Within my signal handler I use the WebKitWebResource object in the web_resource parameter to access the loaded data. Everything works fine with the GLib.GString returned from get_data() when it does not contain a NULL byte, I can access what I need using data.str. However when the data does contain a NULL byte, which is often the case when the MIME type of the loaded resource is an image, data.len is correct but data.str only contains the data up to the first NULL byte. I can access the raw bytes by calling data.free_to_bytes() which returns a GLib.GBytes instance, however when the signal handler returns the application segfaults. I'm trying to access all the data within the loaded resource.
I hope the following code helps demonstrate the issue.
from gi.repository import Gtk
from gi.repository import WebKit
def signal_resource_load_finished(webview, frame, resource):
gstring = resource.get_data()
print(resource.get_mime_type())
desired_len = gstring.len
# gstring.str is missing data because it returns the data up to the first NULL byte
assert(gstring.str == desired_len) # this assertion fails
# calling this causes a segfault after the handler returns, but the data is accessible from gbytes.get_data()
#gbytes = gstring.free_to_bytes()
#assert(len(gbytes.get_data()) == desired_len) # this assertion succeeds before the segfault
return
webview = WebKit.WebView()
webview.connect('resource-load-finished', signal_resource_load_finished)
webview.connect('load-finished', Gtk.main_quit)
# lol cat for demo purposes of a resource containing NULL bytes (mime type: image/png)
webview.load_uri('http://images5.fanpop.com/image/photos/30600000/-Magical-Kitty-lol-cats-30656645-1280-800.png')
Gtk.main()
You don't want to use free_to_bytes
as this will not only give you the bytes you want, but also release the string from memory without Python knowing about it - which, as you discovered, crashes your program. Unfortunately there isn't a corresponding get_bytes
method as GLib.String
wasn't really designed to hold binary data.
In fact I'd consider it a mistake in the WebKit API that the resource payload is only available as a GLib.String
. They seem to have corrected this mistake in WebKit2: http://webkitgtk.org/reference/webkit2gtk/stable/WebKitWebResource.html
Consider switching to WebKit2 if you can (from gi.repository import WebKit2
).