Search code examples
javascriptpythonweb-scrapingurllib

Submiting Javascript Form and Scrape with Python


I have the following HTML/Javascript code in a website. It basically represents a website with two fields: a) name="N": Field were you mark the "V" letter; b) name="ID" were you input a number with max 8 characters.

<tr>
    <td>
        <form name="form" method="post" action="javascript:BuscaR(document.form.N.value, document.form.ID.value)">
<table class="aux">
    <tr>
        <td>
            <select name="N" class="form">
            <option value="V">V</option>
            </select>
        </td>
        <td>
            <input name="ID" type="text" class="form"  maxlength="8" size="8" value="ID" onfocus="javascript:clear_textbox3();" onblur="javascript:Valid(document.form.ID);"/>
        </td>
    </tr>
    <tr>
        <td>
            <input type="submit" value="Buscar" class="boton"/>
        </td>
    </tr>
    </table>
    </form>
</td>

I have done webscrapers with BeautifulSoup and urllib before. My idea is to produce a script which inputs and submits these ID numbers (from a huge database), and retrieves the data which the website responds (it returns a HTML).

However, I can't find where does this form "leads" to. I mean, how do I input? How do I "press" submit in Python?

On most posts, we now what is the php URL were submitting the Form leads. So they can change the ID in php.?N=V,ID=x and "brute force" different numbers. Yet I cannot find this url in the website. What do I do?

The original website is http://www.cne.gob.ve/web/index.php on the right side it says "Consulte sus Datos. Proceso de validación y exclusión de registros presentados por el partido MUD." and the box presents a search buttom.

Thank you all!


Solution

  • It is a simple get request, passing two params:

    enter image description here

    So with requests:

    url = "http://www.cne.gob.ve/web/registro_electoral/firmantes.php"
    
    params = {"nacionalidad":"V",
    "cedula":"12345678"}
    
    page = requests.get(url, params=params)
    print(page.content)
    

    If you pass a correct id you will get a table of data returned, using 12345678 you see a table returned and Esta Cédula de Identidad no se encuentra en la base de datos de los registros presentados por el partido MUD as obviously it is not a valid id.