Search code examples
pythonseleniumpdfweb-scrapingdownloading-website-files

Python 3 web scraping with selenium : ui-dialog trouble with switching


I'm a student and new to Python. I would like to download pdf files (these are financial reports from different organizations) from a website, but before this I have to go through some steps. Here's the website that I'm dealing with: http://sprawozdaniaopp.mpips.gov.pl/ There are many organizations here, so I thought that it would be good to download pdfs with script. Firstly, my script clicks on Search button (without any criteria - to find all) -> as an effect whole list of links loads. When I click on link -> smaller window appears on the same site (this window refers only to organization that I clicked in). And - here's the problem - my script can't switch to that window. I was searching through the internet and found driver.switch_to.window or driver.switch_to.frame functions, but it didn't work or I didn't use it correctly. I'm afraid that this is not any frame but ui-dialog(?). When I clicked right button on this window and examined this window I found something like that:

<div class="ui-dialog ui-widget ui-widget-content ui-corner-all" tabindex="-1" role="dialog" aria-labelledby="ui-dialog-title-2" style="display: block; z-index: 1002; outline: 0px; height: auto; width: 600px; top: 234.5px; left: 328px;"><div class="ui-dialog-titlebar ui-widget-header ui-corner-all ui-helper-clearfix"><span class="ui-dialog-title" id="ui-dialog-title-2">Szczegółowe informacje o organizacji</span><a href="#" class="ui-dialog-titlebar-close ui-corner-all" role="button"><span class="ui-icon ui-icon-closethick">close</span></a></div><div style="width: auto; min-height: 0px; height: 401.896px;" class="ui-dialog-content ui-widget-content" scrolltop="0" scrollleft="0"> (...)

A don't know how to tell my script to switch to this kind of dialog window (?) to enable it search for link "Sprawozdanie merytoryczne" only for 2016 year.

Strange thing with this site is that when I check the link, there is for example : http://sprawozdaniaopp.mpips.gov.pl/Search/Details/0000000168 it could be opened only clicking on it left button. When I try to open it in new tab it is impossible ( why ?). The effect is below: "Server Error in '/' Application. The resource cannot be found. Description: HTTP 404. The resource you are looking for (or one of its dependencies) could have been removed, had its name changed, or is temporarily unavailable. Please review the following URL and make sure that it is spelled correctly. "

Here is my script in Python:

import urllib
import urllib.request
import requests
import re

url = "http://sprawozdaniaopp.mpips.gov.pl/Search/Print/13313?reporttypeId=13"


r = requests.get(url)
#with open(r'C:\Users\username\Desktop\financialreport1.pdf', 'wb') as f:
#       f.write(r.content)

from selenium import webdriver

chrome_path= r"C:\Users\username\AppData\Local\Programs\Python\Python35-32\Scripts\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("http://sprawozdaniaopp.mpips.gov.pl/")

#Button Search called here in polish "Znajdź"
elem = driver.find_element_by_xpath("//*[@id='btnsearch']/span") 
elem.click()

#testing if I'm able to find links on this website 
#elems = driver.find_elements_by_xpath("//a[@href]")
#for elem in elems:
    #print (elem.get_attribute("href"))

#Clicking on first link ( in future I wanted to do it in loop for every link
#elem1 = driver.find_element_by_xpath("//*[@id='form1']/div/div[4]/table/tbody/tr[1]/td[3]/a")
elem1 = driver.find_element_by_css_selector("#form1 > div > div.grid > table > tbody > tr:nth-child(1) > td:nth-child(3) > a")
elem1.click()

#doesn't work
#driver.switch_to.window("#form1 > div > div.grid > table > tbody > tr:nth-child(1) > td:nth-child(3) > a")

#below doesn't work because I can't switch to window where elem2 is placed
elem2 = driver.find_element_by_css_selector("body > div.ui-dialog.ui-widget.ui-widget-content.ui-corner-all > div.ui-dialog-content.ui-widget-content > table:nth-child(4) > tbody > tr:nth-child(7) > td:nth-child(1) > a")
elem2.click()

I attach some screens to illustrate my problem. I would be very grateful for any piece of advice or some key words that I should look for (maybe the case is obvious and I don't understand it as a newbie). Greetings!

partial list of organizations wanted pdf document which opens in new tab after clicking on yellow link


Solution

  • On the Website http://sprawozdaniaopp.mpips.gov.pl/ after clicking the Search button and clicking on the first link we need to wait for the Modal Box to open and then we have to click on the Sprawozdanie merytoryczne link. Here is your own code with a simple tweak as follows :

    elem1 = driver.find_element_by_css_selector("#form1 > div > div.grid > table > tbody > tr:nth-child(1) > td:nth-child(3) > a")
    elem1.click()
    WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR,".ui-dialog.ui-widget.ui-widget-content.ui-corner-all")))
    driver.find_element_by_link_text("Sprawozdanie merytoryczne").click()