Search code examples
pythonweb-scrapinghtml-parsing

How to parse specific HTML table from website on python


I am a beginner in web scraping with python. I am trying to parse the table of places of worship in Langkawi Island. This is the website I am referring to http://www.jaik.gov.my/?page_id=658

I've entered the the following in python:-

import requests

import lxml.html as lh

import pandas as pd

langkawi_url = 'http://www.jaik.gov.my/?page_id=658'

page = requests.get(langkawi_url)

doc = lh.fromstring(page.content)

tr_elements = doc.xpath('//td')

[len(T) for T in tr_elements[:12]]

tr_elements = doc.xpath('//tr')

col = []
i = 0

for t in tr_elements[0]:
    i+=1
    name=t.text_content()
    print("%d:%s" % (i,name))
    col.append((name,[]))

Apparently the output I got is this:-

1:Sun
2:Mon
3:Tue
4:Wed
5:Thu
6:Fri
7:Sat

I was hoping to get this:-

1:BIL
2:KARIAH MASJID
3:ALAMAT
4:MUKIM

Your advise and guidance is much appreciated.

Thank you!


Solution

  • Try changing your code to something like:

    tr_elements = doc.xpath('//td/strong')
    col = []
    for t in tr_elements:
        col.append(t.text)
    print(col)
    

    Output:

    ['BIL', 'KARIAH MASJID', 'ALAMAT', 'MUKIM']