I'm trying to find certain keywords in the html source code of multiple websites. I want my crawler to find these keywords regardless whether they are written uppercased, or lowercased in the website's html source code. To get this done I've tried using the .lower()
query in this script:
from selenium import webdriver
import csv
def keywords():
with open('urls.csv') as csv_file:
csv_reader = csv.reader(csv_file)
driver = webdriver.Chrome(executable_path=r'C:\Users\Peter\PycharmProjects\Testing\chromedriver_win32\chromedriver.exe')
list_1 = ['keyword 1', 'keyword 2', 'keyword 3']
list_2 = ['keyword 4', 'keyword 5', 'keyword 6']
list_3 = ['keyword 7', 'keyword 8']
keywords = [list_1, list_2, list_3]
for row in csv_reader:
driver.get(row[0])
html = driver.page_source
for searchstring in keywords:
if searchstring.lower() in html.lower():
print (row[0], searchstring, 'found')
else:
print (row[0], searchstring, 'not found')
print keywords()
Error:
AttributeError: 'list' object has no attribute 'lower'
So i found out that .lower()
doesn't work on lists, works only with strings.
I've googled the error and my issue but didn't found a solution to my problem. Any suggestion how i can solve this with my current script?
You can make your keywords as a list of strings in list of list of strings. Here i am already lowering the keywords.
from selenium import webdriver
import csv
def keywords():
with open('urls.csv') as csv_file:
csv_reader = csv.reader(csv_file)
driver = webdriver.Chrome(executable_path=r'C:\Users\Peter\PycharmProjects\Testing\chromedriver_win32\chromedriver.exe')
list_1 = ['keyword 1', 'keyword 2', 'keyword 3']
list_2 = ['keyword 4', 'keyword 5', 'keyword 6']
list_3 = ['keyword 7', 'keyword 8']
lower_list = lambda x: x.lower()
keywords = map(lower_list, list_1 + list_2 + list_3)
for row in csv_reader:
driver.get(row[0])
html = driver.page_source