Project: saving all the URLs/titles from https://theuselessweb.com/
Code to test (only 3 pages and print not save):
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from time import sleep
PATH = r"C:\Users\XXX\Documents\scraping\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://theuselessweb.com/")
driver.switch_to.window(driver.window_handles[-1])
button = driver.find_element_by_id("button")
for i in range(3):
button.click()
sleep(2)
driver.switch_to.window(driver.window_handles[-1])
print(driver.current_url)
print(driver.title)
driver.close()
Error(s):
DevTools listening on ws://127.0.0.1:60235/devtools/browser/a5ea4ab0-fba6-4a34-b0ee-8926876c554f
[11636:4168:0626/143411.535:ERROR:device_event_log_impl.cc(214)] [14:34:11.535] USB: usb_device_handle_win.cc:1058 Failed to read descriptor from node connection: Ein an das System angeschlossenes Gerõt funktioniert nicht. (0x1F)
[11636:4168:0626/143411.552:ERROR:device_event_log_impl.cc(214)] [14:34:11.552] USB: usb_device_handle_win.cc:1058 Failed to read descriptor from node connection: Ein an das System angeschlossenes Gerõt funktioniert nicht. (0x1F)
[11636:4168:0626/143411.555:ERROR:device_event_log_impl.cc(214)] [14:34:11.555] USB: usb_device_handle_win.cc:1058 Failed to read descriptor from node connection: Ein an das System angeschlossenes Gerõt funktioniert nicht. (0x1F)
https://thatsthefinger.com/ #this is what I want
The finger, deal with it. #this is what I want
Traceback (most recent call last):
File "C:\Users\XXX\Documents\scraping\programs\linkscraping.py", line 16, in <module>
button.click()
File "C:\Users\XXX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\selenium\webdriver\remote\webelement.py", line 80, in click
self._execute(Command.CLICK_ELEMENT)
File "C:\Users\XXX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\selenium\webdriver\remote\webelement.py", line 633, in _execute
return self._parent.execute(command, params)
File "C:\Users\XXX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\XXX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchWindowException: Message: no such window: target window already closed
from unknown error: web view not found
(Session info: chrome=91.0.4472.124)
It prints out the URL and title of the first website and then crashes. Also everytime i run the driver.get(ANYURL)
command, it opens the link AND the Chrome settings (chrome://settings/triggeredResetProfileSettings). Maybe this messes it up, anyway it would be really helpful if i could get rid of this unwanted window too.
Here is a solution to the problem. it still opens every link but since it's headless it's not visible to the user.
In this case, X is the number of random websites you want to extract
The code opens the site and then clicks the button the number of times you want in accordance with x and then goes on each one and logs the results. At the end, it closes Chrome.
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
options = Options()
options.headless = True
driver = webdriver.Chrome(
ChromeDriverManager().install(),
options=options
)
x = 10
driver.get('https://theuselessweb.com/')
button = button = driver.find_element_by_id("button")
for i in range(x):
button.click()
for i in range(x):
driver.switch_to.window(driver.window_handles[i+1])
print(driver.current_url)
print(driver.title)
driver.quit()