I've scraped the wikipedia table using Python Beautifulsoup (https://en.wikipedia.org/wiki/Districts_of_Hong_Kong). But except for the offered data (i.e. population, area, density and region), I would like to get the location coordinates for each district. The data should get from another page of each district (there are the hyperlinks on the table).
Take the first district 'Central and Western District' for example, the DMS coordinates (22°17′12″N 114°09′18″E) can be found on the page. By further clicking the link, I could get the decimal coordinates (22.28666, 114.15497).
So, is it possible to create a table with Latitude and Longitude for each district?
New to the programming world, sorry if the question is stupid...
Reference:
DMS coordinates: https://en.wikipedia.org/wiki/Central_and_Western_District
Decimal coordinates: https://tools.wmflabs.org/geohack/geohack.php?pagename=Central_and_Western_District¶ms=22.28666_N_114.15497_E_type:adm2nd_region:HK
import requests
from bs4 import BeautifulSoup
res = requests.get('https://en.wikipedia.org/wiki/Districts_of_Hong_Kong')
result = {}
soup = BeautifulSoup(res.content,'lxml')
tables = soup.find_all('table',{'class':'wikitable'})
table = tables[0].find('tbody')
districtLinks = table.find_all('a',href=True)
for link in districtLinks:
if link.getText() in link.attrs.get('title','') or link.attrs.get('title','') in link.getText():
district = link.attrs.get('title','')
if district:
url = link.attrs.get('href', '')
else:
continue
else:
continue
try:
res = requests.get("https://en.wikipedia.org/{}".format(url))
except:
continue
else:
soup = BeautifulSoup(res.content, 'lxml')
try:
tables = soup.find_all('table',{'class':'infobox geography vcard'})
table = tables[0].find('tbody')
except:
continue
else:
for row in table.find_all('tr',{'class':'mergedbottomrow'}):
geoLink = row.find('span',{'class': 'geo'}) # 'plainlinks nourlexpansion'
locationSplit = geoLink.getText().split("; ")
result.update({district : {"Latitude ": locationSplit[0], "Longitude":locationSplit[1]}})
print(result)
Result:
{'Central and Western District': {'Latitude ': '22.28666', 'Longitude': '114.15497'}, 'Eastern District, Hong Kong': {'Latitude ': '22.28411', 'Longitude': '114.22414'}, 'Southern District, Hong Kong': {'Latitude ': '22.24725', 'Longitude': '114.15884'}, 'Wan Chai District': {'Latitude ': '22.27968', 'Longitude': '114.17168'}, 'Sham Shui Po District': {'Latitude ': '22.33074', 'Longitude': '114.16220'}, 'Kowloon City District': {'Latitude ': '22.32820', 'Longitude': '114.19155'}, 'Kwun Tong District': {'Latitude ': '22.31326', 'Longitude': '114.22581'}, 'Wong Tai Sin District': {'Latitude ': '22.33353', 'Longitude': '114.19686'}, 'Yau Tsim Mong District': {'Latitude ': '22.32138', 'Longitude': '114.17260'}, 'Islands District, Hong Kong': {'Latitude ': '22.26114', 'Longitude': '113.94608'}, 'Kwai Tsing District': {'Latitude ': '22.35488', 'Longitude': '114.08401'}, 'North District, Hong Kong': {'Latitude ': '22.49471', 'Longitude': '114.13812'}, 'Sai Kung District': {'Latitude ': '22.38143', 'Longitude': '114.27052'}, 'Sha Tin District': {'Latitude ': '22.38715', 'Longitude': '114.19534'}, 'Tai Po District': {'Latitude ': '22.45085', 'Longitude': '114.16422'}, 'Tsuen Wan District': {'Latitude ': '22.36281', 'Longitude': '114.12907'}, 'Tuen Mun District': {'Latitude ': '22.39163', 'Longitude': '113.9770885'}, 'Yuen Long District': {'Latitude ': '22.44559', 'Longitude': '114.02218'}}