I am trying to get data from a table that is updated with javascript inside a web page (http://www.madrid.org/wpad_pub/run/j/MostrarFichaCentro.icm?cdCentro=28063799), and I am using dryscrape. I have a code that works well with tables that are generated by default when the page loads. But I need to update one of them by clicking on a radio button (radio button labeled "Primary" in the second table).
I changed my code, and it looks like this:
from bs4 import BeautifulSoup
import pandas as pd
import dryscrape
render = dryscrape.Session()
render.visit("http://www.madrid.org/wpad_pub/run/j/MostrarFichaCentro.icm?cdCentro=28063799")
radiob = render.at_css('#nivEd12\.grafica3')
radiob.click()
source = render.body()
school_card = BeautifulSoup(source, "lxml")
school_tables = school_card.findAll('table', class_="tablaGraficaDatos")
table = list(school_tables)[1]
pd.read_html(table.prettify())
But I get the following error:
InvalidResponseError: {"class":"ClickFailed","message":"Failed to find position for element /html/body/div[@id='contenedor']/div[@id='solapas']/div[10]/table/tbody/tr[1]/td[1]/div[@id='solapaspanel1']/div[@id='cuerpoL']/div/div[@id='capaSelGrafica']/div[@id='display.grafica3']/table/tbody/tr[2]/td[2]/input[@id='nivEd12.grafica3'] because it is not visible"}
I have tried with also with xpath:
radiob = render.at_xpath('//*[(@id = "nivEd12.grafica3")]')
But I get the same error.
I have used Selector Gadget to get CSS and XPath. I imagine there is some error in the path of the radio button, but I do not know how to fix it. Any idea?
Thanks in advance.
UPDATE
@CtheSky has given me a solution that works fine with singles urls. But when I try to loop to multiple urls, I get an error. This is the script.
schools_urls2 = ['http://www.madrid.org/wpad_pub/run/j/MostrarFichaCentro.icm?cdCentro=28077865',
'http://www.madrid.org/wpad_pub/run/j/MostrarFichaCentro.icm?cdCentro=28063751',
'http://www.madrid.org/wpad_pub/run/j/MostrarFichaCentro.icm?cdCentro=28004989',
'http://www.madrid.org/wpad_pub/run/j/MostrarFichaCentro.icm?cdCentro=28004990']
school_tables_collection = {}
school_name_collection = []
render = dryscrape.Session()
for z, school in enumerate(schools_urls[:5]):
render.visit(school)
render.driver.exec_script('document.getElementById("nivEd12.grafica3").click();')
source = render.body()
school_card = BeautifulSoup(source, "lxml")
school_tables = school_card.findAll('table', class_="tablaGraficaDatos")
school_name = school_card.find(style="text-transform:uppercase").next.next
for i, table in list(enumerate(school_tables)):
if i <= 1:
school_tables_collection[school_name + "_" + str(i)] = \
pd.read_html(table.prettify())
school_name_collection.append(school_name)
print "Tables of school %s extracted" % schools_urls[z]
Any idea about what I'm doing wrong?
SOLUTION
I've finally managed to fix it. It was a silly mistake on my part: the first url I called did not have the button element I was looking for, so it returned error. I've included a try
and except
in the loop and now it works.
Thank you very much for your help @CtheSky
As error message said, there's no problem with css selection. It's because the radio button is invisible so the click fails. It's parent node is not displayed:
<div id="solapaspanel1" style="display: none;">...</div>
You can run a piece of javascript to trigger that click event:
render.driver.exec_script('document.getElementById("nivEd12.grafica3").click();')
Notice that in your second example, there's no element with id=nivEd12.grafica3
in the url http://www.madrid.org/wpad_pub/run/j/MostrarFichaCentro.icm?cdCentro=28077865
. So the script fails by calling click()
method on Null
which is not allowed and raises an error.
Maybe there is no target thing in some pages, or they just use another id or name. You should use more general rule to specify what you want and to avoid this error, you can check whether the element exists by school_card.find_XX(...)
or use eval_script
to run a javascript statement if you like.