Search code examples
pythoncurlhtml-parsinglxml

Python html parsing. Can i do on ready?


I am new to python. Can I make a call to get html content on ready state? I need to parse site where is some html that i can view only on ready state. Is there any variant to do this? Thanks and sorry for my english. Here is my piece of code:

import lxml.html as html
from lxml.html import tostring
import string
import re

letters = list(string.ascii_lowercase)
main_domain_stat = 'http://www.copyright.gov/onlinesp/list/'

page = html.parse('%s/a_agents.html' % (main_domain_stat))

Solution

  • There is no way you can get the state of readystate just with looking at the html. The html module is an html parser (you should try beautifulsoup) and you can access those kind of parameters, you will just get the html code.

    I see two solutions : First, you look at something that may appears at the end of the loading. If you find it, it exists. Second, you may use selenium webdriver (python module) to test that the page is completely downloaded.