I want to create a dataframe by scrapping the table here which has different class
name for each row and contains nested elements.
table_rows = driver.find_elements(By.CLASS_NAME, "bgColor-white")
for _, val in enumerate(table_rows):
print(val.text)
Print
output of the above code is string but could not segregate into appropriate columns.
Identify the table element and then get the outerHTML
of the table element.
Use pandas read_html()
method and get the dataframe
.
driver.get ("https://www.egp.gov.bt/resources/common/TenderListing.jsp?lang=en_US&langForMenu=en_US&h=t")
time.sleep(3)
table= driver.find_element(By.CSS_SELECTOR, "table#resultTable").get_attribute("outerHTML")
df=pd.read_html(table)[0]
print(df)
console output:
Sl. No. Tender ID, Reference No, Public Status ... Type, Method Publishing Date & Time | Closing Date & Time
0 1 15183, TSHA-6/Engineering/9/2022-2023/769, Live ... NCB, OTM 03-Mar-2023 15:00 | 14-Mar-2023 15:10
1 2 15180, STCB/PD/TS/Samtse/2023/213, Live ... NCB, OTM 03-Mar-2023 10:00 | 14-Mar-2023 11:10
2 3 15160, JNEC/Adm-33/2022-2023, Cancelled ... NCB, OTM 02-Mar-2023 22:00 | 10-Mar-2023 10:30
3 4 15179, DAG/DEHSS(07)/2022-2023/148, Live ... NCB, OTM 02-Mar-2023 15:00 | 16-Mar-2023 09:00
4 5 15181, DCHS/PRP-01/2022-2023/244, Amendment/Co... ... NCB, OTM 02-Mar-2023 09:00 | 13-Mar-2023 10:30
5 6 15174, NBC/Adm/06/2022/1198, Live ... NCB, OTM 01-Mar-2023 09:00 | 20-Mar-2023 11:30
6 7 15161, PDA/adm -35/2022-2023/, Live ... NCB, OTM 27-Feb-2023 16:00 | 10-Mar-2023 11:00
7 8 15169, MD/Dz.EHSS-20/2022-2023/5179, Amendmen... ... NCB, OTM 27-Feb-2023 14:30 | 10-Mar-2023 14:00
8 9 15157, nofp2, Live ... NCB, OTM 21-Feb-2023 09:00 | 08-Mar-2023 11:30
9 10 15158, MD/DES-20/2022-2023/5095, Being processed ... NCB, OTM 21-Feb-2023 02:00 | 02-Mar-2023 10:00
[10 rows x 6 columns]