I am writing a simple script to print out my IP Address in terminal. I am having trouble removing the HTML tags from the print statement.
I have tried using the .strip() function from the urllib library. I do not understand regex enough to input into this code.
import re
import urllib.request, urllib.parse, urllib.error
import json
data = urllib.request.urlopen('http://checkip.dyndns.org')
for line in data:
print(line.decode().strip())
I expect the output to solely be my IP (xxx.xx.xx.xxx) but instead am getting the following
"< html>< head>< title>Current IP Check< /title>< /head>< body>Current IP Address: XXX.XX.XX.XXX< /body>< /html>"
If you want to use regex, instead of stripping tags you can just match the part you are interested in using parentheses, here's an example:
import re
import urllib.request
data = urllib.request.urlopen('http://checkip.dyndns.org').read().decode()
print(re.search(r'Current IP Address: ([\d\.]+)', data).group(1))
You can find more info and examples at https://docs.python.org/2/library/re.html#match-objects
For removing HTML tags in general you can use something like this using re:
print(re.sub('<[^<]+?>', '', '<html>foo</html>'))
Or even easier using BeatufilSoup instead of re:
from bs4 import BeautifulSoup
print(BeautifulSoup('<html>foo</html>').get_text())