I have a simple web crawler, and I would use it in a loop to crawl information from youtube videos, as shown down below
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import time
def Scrap(url):
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(executable_path=ChromeDriverManager().install())
driver.get(url)
time.sleep(6)
#I will do some operations with the page source here
driver.close()
urls = ["https://www.youtube.com/watch?v=FWMIPukvdsQ", "https://www.youtube.com/watch?v=Ot4qdCs54ZE"]
for url in urls :
Scrap(url)
Everything works fine, but it is annoying that I have to installed the driver twice. I think it has significantly slowed down the program when I crawl data from hundreds of websites. And it feels bad. I have tried two methods to only install the driver once and use it in various functions and loops.
Method 1: Manually assigned the path:
def update_driver():
driver = webdriver.Chrome(executable_path=ChromeDriverManager().install())
Then, the output will include the path of the installed driver, and I will manually copy and assign it to a variable so that other crawler functions can use.
Problem with Method 1: I have to copy and paste it. Is there any way to automate it? Maybe I can gett the output of the installation and filter it?
Method 2: Make driver a global variable. Problem with Method 2: It would report errors when the driver is used for more than one url.
The Problem
You are using the driver inside the scrap
method which is making it to launch the browser as many times as your len(URLs)
You closing the browser driver.close
inside the scrap
method which should be done after the loop.
The Solution
driver = webdriver.Chrome(executable_path=ChromeDriverManager().install())
def Scrap(url):
driver.get(url)
time.sleep(1)
# I will do some operations with the page source here
urls = ["https://www.youtube.com/watch?v=FWMIPukvdsQ", "https://www.youtube.com/watch?v=Ot4qdCs54ZE"]
for url in urls:
Scrap(url)
driver.close()