I was trying to write a script with Python
to export the Product Attributes
table as an Excel file (or CSV) from the URL below.
I wrote a script and tried a different class name, but I faced an error!
The URL: https://www.digikey.com/en/products/detail/texas-instruments/uln2003aidre4/1912622
I don't know what the reason for this message is because I could export the table from different websites but my code crashed on this website. (And also Mouser.com)
I had a theory and I think these two websites are blocking my script to avoid exporting their data but I'm not sure.
The table I want to export and its inspection
Here is my code:
import time
import requests
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import pandas as pd
def get_specifications_table(url):
options = Options()
options.add_argument('--headless') # Run the browser in headless mode (no visible window)
driver = webdriver.Chrome(options=options)
driver.get(url)
time.sleep(5) # Add a delay to allow the webpage to load (adjust the time as needed)
try:
# Find the element with the specified class name "MuiTable-root css-u6unfi" and extract the table
class_name = "MuiTable-root.css-u6unfi"
table_element = driver.find_element("css selector", f".{class_name}")
table_html = table_element.get_attribute('outerHTML')
df = pd.read_html(table_html)[0]
return df
except Exception as e:
print("Error:", e)
finally:
driver.quit()
return None
def export_to_excel(df, output_file):
writer = pd.ExcelWriter(output_file, engine='xlsxwriter')
df.to_excel(writer, index=False)
writer.save()
writer.close()
if __name__ == '__main__':
url = "https://www.digikey.com/en/products/detail/texas-instruments/uln2003aidre4/1912622"
output_excel_file = "Specifications_Table_Digikey.xlsx"
print("Fetching the webpage and extracting the table...")
specifications_df = get_specifications_table(url)
if specifications_df is not None:
print("Exporting the table to Excel...")
export_to_excel(specifications_df, output_excel_file)
print(f"Table 'Specifications' exported to '{output_excel_file}' successfully.")
else:
print("Table extraction or export failed.")
But I face this error:
Fetching the webpage and extracting the table...
Error: Message: no such element: Unable to locate element: {"method":"css selector","selector":".MuiTable-root.css-u6unfi"}
(Session info: headless chrome=115.0.5790.110); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception
Stacktrace:
Backtrace:
GetHandleVerifier [0x004BA813+48355]
(No symbol) [0x0044C4B1]
(No symbol) [0x00355358]
(No symbol) [0x003809A5]
(No symbol) [0x00380B3B]
(No symbol) [0x003AE232]
(No symbol) [0x0039A784]
(No symbol) [0x003AC922]
(No symbol) [0x0039A536]
(No symbol) [0x003782DC]
(No symbol) [0x003793DD]
GetHandleVerifier [0x0071AABD+2539405]
GetHandleVerifier [0x0075A78F+2800735]
GetHandleVerifier [0x0075456C+2775612]
GetHandleVerifier [0x005451E0+616112]
(No symbol) [0x00455F8C]
(No symbol) [0x00452328]
(No symbol) [0x0045240B]
(No symbol) [0x00444FF7]
BaseThreadInitThunk [0x772500C9+25]
RtlGetAppContainerNamedObjectPath [0x77BC7B4E+286]
RtlGetAppContainerNamedObjectPath [0x77BC7B1E+238]
Table extraction or export failed.
To scrape data from the Product Attributes table from the website ULN2003AIDRE4 Texas Instruments | Integrated Circuits (ICs) | DigiKey you need to induce WebDriverWait for the visibility_of_element_located() for the <table>
element and using DataFrame from Pandas you can use the following locator strategy:
Code Block:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
options = Options()
options.add_argument('--headless=new')
options.add_argument("start-maximized")
driver = webdriver.Chrome(options=options)
driver.get("https://www.digikey.com/en/products/detail/texas-instruments/uln2003aidre4/1912622")
time.sleep(10)
table_data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div[data-evg='product-details-product-attributes'] table.MuiTable-root"))).get_attribute("outerHTML")
df = pd.read_html(table_data)
print(df)
driver.quit()
Console Output:
[ Type Description Select
0 Category Integrated Circuits (ICs)Power Management (PMI... NaN
1 Mfr Texas Instruments NaN
2 Series ULx200xA NaN
3 Package Tape & Reel (TR) NaN
4 Product Status Discontinued at Digi-Key NaN
5 Switch Type Relay, Solenoid Driver NaN
6 Number of Outputs 7 NaN
7 Ratio - Input:Output 1:1 NaN
8 Output Configuration Low Side NaN
9 Output Type Darlington NaN
10 Interface Parallel NaN
11 Voltage - Load 50V (Max) NaN
12 Voltage - Supply (Vcc/Vdd) Not Required NaN
13 Current - Output (Max) 500mA NaN
14 Rds On (Typ) - NaN
15 Input Type Inverting NaN
16 Features - NaN
17 Fault Protection - NaN
18 Operating Temperature -40°C ~ 105°C (TA) NaN
19 Mounting Type Surface Mount NaN
20 Supplier Device Package 16-SOIC NaN
21 Package / Case 16-SOIC (0.154", 3.90mm Width) NaN
22 Base Product Number ULN2003 NaN]
You can find a couple of relevant detailed discussions in: