Search code examples
pythonhtmlxmlscraper

Python Parse single line of XML


What im trying to do is make a scraper and there is a login page, im filling two out of three values needed to get on the next page

the scraper needs a username,password and then the token,

im autofilling the username and password and ive narrowed the html response down to the one input tag in python.

The tags code is:

<input type="hidden" name="licence[_csrf_token]" value="SOME RANDOM CHECKSUM" id="licence__csrf_token" />

is there any way of getting this and by the way the checksum is dynamic as in it changes length.


Solution

  • BeautifulSoup is one good way to parse arbitrary HTML:

    from bs4 import BeautifulSoup
    
    html_doc = '''<input type="hidden" 
                         name="licence[_csrf_token]" 
                         value="SOME RANDOM CHECKSUM"
                         id="licence__csrf_token" />'''
    
    soup = BeautifulSoup(html_doc)
    print soup.input['value']