I'm sure everyone will groan, and tell me to look at the documentation (which I have) but I just don't understand how to achieve the same as the following:
curl -s http://www.maxmind.com/app/locate_my_ip | awk '/align="center">/{getline;print}'
All I have in python3 so far is:
import urllib.request
f = urllib.request.urlopen('http://www.maxmind.com/app/locate_my_ip')
for lines in f.readlines():
print(lines)
f.close()
Seriously, any suggestions (please don't tell me to read http://docs.python.org/release/3.0.1/library/html.parser.html as I have been learning python for 1 day, and get easily confused) a simple example would be amazing!!!
This is based off of larsmans's answer, above.
f = urllib.request.urlopen('http://www.maxmind.com/app/locate_my_ip')
for line in f:
if b'align="center">' in line:
print(next(f).decode().rstrip())
f.close()
Explanation:
for line in f
iterates over the lines in the file-like object, f. Python let's you iterate over lines in a file like you would items in a list.
if b'align="center">' in line
looks for the string 'align="center">' in the current line. The b
indicates that this is a buffer of bytes, rather than a string. It appears that urllib.reqquest.urlopen
interpets the results as binary data, rather than unicode strings, and an unadorned 'align="center">'
would be interpreted as a unicode string. (That was the source of the TypeError
above.)
next(f)
takes the next line of the file, because your original awk script printed the line after 'align="center">' rather than the current line. The decode
method (strings have methods in Python) takes the binary data and converts it to a printable unicode object. The rstrip()
method strips any trailing whitespace (namely, the newline at the end of each line.