I have a setup where a web page in a local server (localhost:8080) is changed dynamically by sending sockets that load some scripts (d3 code mainly). In chrome I can inspect the "rendered html status" of the page, i.e., the resulting html code of the d3/javascript loaded codes. Now, I need to save that "full html snapshot" of the rendered web-page to be able to see it later, in a "static" way. I have tried many solutions in python, which work well to load a web and save its "on-load" d3/javascript processed content, but DO NOT get info about the code generated "after" the load. I could also use javascript to make this if no python solution is found.
Remember that I need to retrieve the full html rendered code that has been "dynamically" modified in time, in a chosen moment of time.
Here are a list of questions found in stackoverflow that are related but do not answer this question. Not answered: How to save dynamically changed HTML? Answered but not for dynamically changed html: Using PyQt4 to return Javascript generated HTML Not Answered: How to save dynamically added data to update the page (using jQuery) Not dynamic: Python to Save Web Pages
The question could be solved using selenium-python (thanks to @Juca suggestion to use selenium).
Once installed (pip install selenium
) this code makes the trick:
from selenium import webdriver
# initiate the browser. It will open the url,
# and we can access all its content, and make actions on it.
browser = webdriver.Firefox()
url = 'http://localhost:8080/test.html'
# the page test.html is changing constantly its content by receiving sockets, etc.
#So we need to save its "status" when we decide for further retrieval)
# wait until we want to save the content (this could be a buttonUI action, etc.):
raw_input("Press to print web page")
# save the html rendered content in that moment:
html_source = browser.page_source
# display to check:
print html_source