I have this code that outputs coordinates for a port:
import urllib
import urllib.request as request
import re
a = input("What country is your port in?: ")
b = input("What is the name of the port?: ")
url = "http://ports.com/"
country = ["united-kingdom","greece"]
ports = ["port-of-eleusis","portsmouth-continental-ferry-port","poole-harbour"]
totalurl = "http://ports.com/" + a + "/" + b + "/"
htmlfile = urllib.request.urlopen(totalurl)
htmltext = htmlfile.read()
regex = '<strong>Coordinates:</strong>(.*?)</span>'
pattern = re.compile(regex)
with urllib.request.urlopen(totalurl) as response:
html = htmltext.decode()
num = re.findall(pattern, html)
print(num)
The output is correct and readable but I need the coordinates to something like this format: 39°09'24.6''N 175°37'55.8''W instead of :
>>> [' 50°48′41.04″N 1°5′31.31″W']
Your error is caused because HTML internally uses these codes to display specific unicode characters, while python does not. To fix this, replace print(num)
with print(list(i.replace('°', "°").replace('′',"′").replace('″',"″") for i in num))
This essentially replaces °
with °
, ′
with ′
, and ″
with ″
.
>>> print(list(i.replace('°', "°").replace('′',"′").replace('″',"″") for i in num))
[" 50°48′41.04″N 1°5′31.31″W"]
>>>