Search code examples
pythontorrentbencoding

I want to get the information of a torrent file in a desired format through python


I am writing code to parse tracker information in torrent file using python.

import bencoder
import sys

target = './'+sys.argv[1]

with open(target, 'rb') as torrent_file:
    torrent = bencoder.decode(torrent_file.read())

i=0
while True:
    try:
        print(torrent[b'announce-list'][i])
        i+=1
    except:
        break
    

The output is as follows.

[b'udp://tracker.openbittorrent.com:80/announce']

[b'udp://tracker.opentrackr.org:1337/announce']

I want to parse the value in the form below.

["tracker.openbittorrent.com", 80]

["tracker.opentrackr.org", 1337]

How should I parse it?


Solution

  • You might use urllib.parse.urlparse for this as follows

    from urllib.parse import urlparse
    url1 = b'udp://tracker.openbittorrent.com:80/announce'
    url2 = b'udp://tracker.opentrackr.org:1337/announce'
    c1 = urlparse(url1)
    c2 = urlparse(url2)
    hostport1 = c1.netloc.rsplit(b':',1)
    hostport2 = c2.netloc.rsplit(b':',2)
    hostport1[0] = hostport1[0].decode()
    hostport1[1] = int(hostport1[1])
    hostport2[0] = hostport2[0].decode()
    hostport2[1] = int(hostport2[1])
    print(hostport1)
    print(hostport2)
    

    output

    ['tracker.openbittorrent.com', 80]
    ['tracker.opentrackr.org', 1337]
    

    Explanation: I extract netloc, then split at most once at first from right b':', then apply .decode to host port to convert bytes into str and int to convert bytes into int.

    EDIT: After more careful reading, I noticed that you might access .hostname and .port which allow much more concise code to do that task, that is

    from urllib.parse import urlparse
    url1 = b'udp://tracker.openbittorrent.com:80/announce'
    url2 = b'udp://tracker.opentrackr.org:1337/announce'
    c1 = urlparse(url1)
    c2 = urlparse(url2)
    hostport1 = [c1.hostname.decode(), c1.port]
    hostport2 = [c2.hostname.decode(), c2.port]
    print(hostport1)
    print(hostport2)
    

    gives same output as code above.