I am a beginner in web scraping with python. I am trying to parse the table of places of worship in Langkawi Island. This is the website I am referring to http://www.jaik.gov.my/?page_id=658
I've entered the the following in python:-
import requests
import lxml.html as lh
import pandas as pd
langkawi_url = 'http://www.jaik.gov.my/?page_id=658'
page = requests.get(langkawi_url)
doc = lh.fromstring(page.content)
tr_elements = doc.xpath('//td')
[len(T) for T in tr_elements[:12]]
tr_elements = doc.xpath('//tr')
col = []
i = 0
for t in tr_elements[0]:
i+=1
name=t.text_content()
print("%d:%s" % (i,name))
col.append((name,[]))
Apparently the output I got is this:-
1:Sun
2:Mon
3:Tue
4:Wed
5:Thu
6:Fri
7:Sat
I was hoping to get this:-
1:BIL
2:KARIAH MASJID
3:ALAMAT
4:MUKIM
Your advise and guidance is much appreciated.
Thank you!
Try changing your code to something like:
tr_elements = doc.xpath('//td/strong')
col = []
for t in tr_elements:
col.append(t.text)
print(col)
Output:
['BIL', 'KARIAH MASJID', 'ALAMAT', 'MUKIM']