I am writing code to parse tracker information in torrent file using python.
import bencoder
import sys
target = './'+sys.argv[1]
with open(target, 'rb') as torrent_file:
torrent = bencoder.decode(torrent_file.read())
i=0
while True:
try:
print(torrent[b'announce-list'][i])
i+=1
except:
break
The output is as follows.
[b'udp://tracker.openbittorrent.com:80/announce']
[b'udp://tracker.opentrackr.org:1337/announce']
I want to parse the value in the form below.
["tracker.openbittorrent.com", 80]
["tracker.opentrackr.org", 1337]
How should I parse it?
You might use urllib.parse.urlparse
for this as follows
from urllib.parse import urlparse
url1 = b'udp://tracker.openbittorrent.com:80/announce'
url2 = b'udp://tracker.opentrackr.org:1337/announce'
c1 = urlparse(url1)
c2 = urlparse(url2)
hostport1 = c1.netloc.rsplit(b':',1)
hostport2 = c2.netloc.rsplit(b':',2)
hostport1[0] = hostport1[0].decode()
hostport1[1] = int(hostport1[1])
hostport2[0] = hostport2[0].decode()
hostport2[1] = int(hostport2[1])
print(hostport1)
print(hostport2)
output
['tracker.openbittorrent.com', 80]
['tracker.opentrackr.org', 1337]
Explanation: I extract netloc, then split at most once at first from right b':'
, then apply .decode
to host port to convert bytes
into str
and int
to convert bytes
into int
.
EDIT: After more careful reading, I noticed that you might access .hostname
and .port
which allow much more concise code to do that task, that is
from urllib.parse import urlparse
url1 = b'udp://tracker.openbittorrent.com:80/announce'
url2 = b'udp://tracker.opentrackr.org:1337/announce'
c1 = urlparse(url1)
c2 = urlparse(url2)
hostport1 = [c1.hostname.decode(), c1.port]
hostport2 = [c2.hostname.decode(), c2.port]
print(hostport1)
print(hostport2)
gives same output as code above.