Search code examples
pythonlibtorrentutorrentmagnet-urilibtorrent-rasterbar

Why is utorrents Magnet to Torrent file fetching is faster than my python script?


i am trying to convert torrent magnet urls in .torrent files using python script. python script connects to dht and waits for meta data then creates torrent file from it.

e.g.

#!/usr/bin/env python
'''
Created on Apr 19, 2012
@author: dan, Faless

    GNU GENERAL PUBLIC LICENSE - Version 3

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.

    http://www.gnu.org/licenses/gpl-3.0.txt

'''

import shutil
import tempfile
import os.path as pt
import sys
import libtorrent as lt
from time import sleep


def magnet2torrent(magnet, output_name=None):
    if output_name and \
            not pt.isdir(output_name) and \
            not pt.isdir(pt.dirname(pt.abspath(output_name))):
        print("Invalid output folder: " + pt.dirname(pt.abspath(output_name)))
        print("")
        sys.exit(0)

    tempdir = tempfile.mkdtemp()
    ses = lt.session()
    params = {
        'save_path': tempdir,
        'duplicate_is_error': True,
        'storage_mode': lt.storage_mode_t(2),
        'paused': False,
        'auto_managed': True,
        'duplicate_is_error': True
    }
    handle = lt.add_magnet_uri(ses, magnet, params)

    print("Downloading Metadata (this may take a while)")
    while (not handle.has_metadata()):
        try:
            sleep(1)
        except KeyboardInterrupt:
            print("Aborting...")
            ses.pause()
            print("Cleanup dir " + tempdir)
            shutil.rmtree(tempdir)
            sys.exit(0)
    ses.pause()
    print("Done")

    torinfo = handle.get_torrent_info()
    torfile = lt.create_torrent(torinfo)

    output = pt.abspath(torinfo.name() + ".torrent")

    if output_name:
        if pt.isdir(output_name):
            output = pt.abspath(pt.join(
                output_name, torinfo.name() + ".torrent"))
        elif pt.isdir(pt.dirname(pt.abspath(output_name))):
            output = pt.abspath(output_name)

    print("Saving torrent file here : " + output + " ...")
    torcontent = lt.bencode(torfile.generate())
    f = open(output, "wb")
    f.write(lt.bencode(torfile.generate()))
    f.close()
    print("Saved! Cleaning up dir: " + tempdir)
    ses.remove_torrent(handle)
    shutil.rmtree(tempdir)

    return output


def showHelp():
    print("")
    print("USAGE: " + pt.basename(sys.argv[0]) + " MAGNET [OUTPUT]")
    print("  MAGNET\t- the magnet url")
    print("  OUTPUT\t- the output torrent file name")
    print("")


def main():
    if len(sys.argv) < 2:
        showHelp()
        sys.exit(0)

    magnet = sys.argv[1]
    output_name = None

    if len(sys.argv) >= 3:
        output_name = sys.argv[2]

    magnet2torrent(magnet, output_name)


if __name__ == "__main__":
    main()

above script takes around 1+ minutes to fetch metadata and create .torrent file while utorrent client only takes few seconds , why is that ?

How can i make my script faster ?

i would like to fetch metadata for around 1k+ torrents.

e.g. Magnet Link

magnet:?xt=urn:btih:BFEFB51F4670D682E98382ADF81014638A25105A&dn=openSUSE+13.2+DVD+x86_64.iso&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.ccc.de%3A80

update :

i have specified known dht router urls like this in my script.

session = lt.session()
session.listen_on(6881, 6891)

session.add_dht_router("router.utorrent.com", 6881)
session.add_dht_router("router.bittorrent.com", 6881)
session.add_dht_router("dht.transmissionbt.com", 6881)
session.add_dht_router("router.bitcomet.com", 6881)
session.add_dht_router("dht.aelitis.com", 6881)
session.start_dht()

but it still slow and sometimes i get errors like

DHT error [hostname lookup] (1) Host not found (authoritative)
could not map port using UPnP: no router found

update :

i have wrote this scmall script which fetches hex info hash from DB and tries to fetch meta data from dht and then inserts the torrent file in DB.

i have made it run indefinitely as i did not know how to save state , so keeping it running will get more peers and fetching meta data will quicker .

#!/usr/bin/env python
# this file will run as client or daemon and fetch torrent meta data i.e. torrent files from magnet uri

import libtorrent as lt # libtorrent library
import tempfile # for settings parameters while fetching metadata as temp dir
import sys #getting arguiments from shell or exit script
from time import sleep #sleep
import shutil # removing directory tree from temp directory 
import os.path # for getting pwd and other things
from pprint import pprint # for debugging, showing object data
import MySQLdb # DB connectivity 
import os
from datetime import date, timedelta

#create lock file to make sure only single instance is running
lock_file_name = "/daemon.lock"

if(os.path.isfile(lock_file_name)):
    sys.exit('another instance running')
#else:
    #f = open(lock_file_name, "w")
    #f.close()

session = lt.session()
session.listen_on(6881, 6891)

session.add_dht_router("router.utorrent.com", 6881)
session.add_dht_router("router.bittorrent.com", 6881)
session.add_dht_router("dht.transmissionbt.com", 6881)
session.add_dht_router("router.bitcomet.com", 6881)
session.add_dht_router("dht.aelitis.com", 6881)
session.start_dht()

alive = True
while alive:

    db_conn = MySQLdb.connect(  host = 'localhost',     user = '',  passwd = '',    db = 'basesite',    unix_socket='') # Open database connection
    #print('reconnecting')
    #get all records where enabled = 0 and uploaded within yesterday 
    subset_count = 5 ;

    yesterday = date.today() - timedelta(1)
    yesterday = yesterday.strftime('%Y-%m-%d %H:%M:%S')
    #print(yesterday)

    total_count_query = ("SELECT COUNT(*) as total_count FROM content WHERE upload_date > '"+ yesterday +"' AND enabled = '0' ")
    #print(total_count_query)
    try:
        total_count_cursor = db_conn.cursor()# prepare a cursor object using cursor() method
        total_count_cursor.execute(total_count_query) # Execute the SQL command
        total_count_results = total_count_cursor.fetchone() # Fetch all the rows in a list of lists.
        total_count = total_count_results[0]
        print(total_count)
    except:
            print "Error: unable to select data"

    total_pages = total_count/subset_count
    #print(total_pages)

    current_page = 1
    while(current_page <= total_pages):
        from_count = (current_page * subset_count) - subset_count

        #print(current_page)
        #print(from_count)

        hashes = []

        get_mysql_data_query = ("SELECT hash FROM content WHERE upload_date > '" + yesterday +"' AND enabled = '0' ORDER BY record_num ASC LIMIT "+ str(from_count) +" , " + str(subset_count) +" ")
        #print(get_mysql_data_query)
        try:
            get_mysql_data_cursor = db_conn.cursor()# prepare a cursor object using cursor() method
            get_mysql_data_cursor.execute(get_mysql_data_query) # Execute the SQL command
            get_mysql_data_results = get_mysql_data_cursor.fetchall() # Fetch all the rows in a list of lists.
            for row in get_mysql_data_results:
                hashes.append(row[0].upper())
        except:
            print "Error: unable to select data"

        print(hashes)

        handles = []

        for hash in hashes:
            tempdir = tempfile.mkdtemp()
            add_magnet_uri_params = {
                'save_path': tempdir,
                'duplicate_is_error': True,
                'storage_mode': lt.storage_mode_t(2),
                'paused': False,
                'auto_managed': True,
                'duplicate_is_error': True
            }
            magnet_uri = "magnet:?xt=urn:btih:" + hash.upper() + "&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.ccc.de%3A80"
            #print(magnet_uri)
            handle = lt.add_magnet_uri(session, magnet_uri, add_magnet_uri_params)
            handles.append(handle) #push handle in handles list

        #print("handles length is :")
        #print(len(handles))

        while(len(handles) != 0):
            for h in handles:
                #print("inside handles for each loop")
                if h.has_metadata():
                    torinfo = h.get_torrent_info()
                    final_info_hash = str(torinfo.info_hash())
                    final_info_hash = final_info_hash.upper()
                    torfile = lt.create_torrent(torinfo)
                    torcontent = lt.bencode(torfile.generate())
                    tfile_size = len(torcontent)
                    try:
                        insert_cursor = db_conn.cursor()# prepare a cursor object using cursor() method
                        insert_cursor.execute("""INSERT INTO dht_tfiles (hash, tdata) VALUES (%s, %s)""",  [final_info_hash , torcontent] )
                        db_conn.commit()
                        #print "data inserted in DB"
                    except MySQLdb.Error, e:
                        try:
                            print "MySQL Error [%d]: %s" % (e.args[0], e.args[1])
                        except IndexError:
                            print "MySQL Error: %s" % str(e)    

                    shutil.rmtree(h.save_path())    #   remove temp data directory
                    session.remove_torrent(h) # remove torrnt handle from session   
                    handles.remove(h) #remove handle from list

                else:
                    if(h.status().active_time > 600):   # check if handle is more than 10 minutes old i.e. 600 seconds
                        #print('remove_torrent')
                        shutil.rmtree(h.save_path())    #   remove temp data directory
                        session.remove_torrent(h) # remove torrnt handle from session   
                        handles.remove(h) #remove handle from list
                sleep(1)        
                #print('sleep1')

        print('sleep10')
        sleep(10)
        current_page = current_page + 1
    #print('sleep20')
    sleep(20)

os.remove(lock_file_name);

now i need to implement new things as suggested by Arvid.


UPDATE

i have managed to implement what Arvid suggested. and some more extension i found in deluge support forums http://forum.deluge-torrent.org/viewtopic.php?f=7&t=42299&start=10

#!/usr/bin/env python

import libtorrent as lt # libtorrent library
import tempfile # for settings parameters while fetching metadata as temp dir
import sys #getting arguiments from shell or exit script
from time import sleep #sleep
import shutil # removing directory tree from temp directory 
import os.path # for getting pwd and other things
from pprint import pprint # for debugging, showing object data
import MySQLdb # DB connectivity 
import os
from datetime import date, timedelta

def var_dump(obj):
  for attr in dir(obj):
    print "obj.%s = %s" % (attr, getattr(obj, attr))

session = lt.session()
session.add_extension('ut_pex')
session.add_extension('ut_metadata')
session.add_extension('smart_ban')
session.add_extension('metadata_transfer')  

#session = lt.session(lt.fingerprint("DE", 0, 1, 0, 0), flags=1)

session_save_filename = "/tmp/new.client.save_state"

if(os.path.isfile(session_save_filename)):

    fileread = open(session_save_filename, 'rb')
    session.load_state(lt.bdecode(fileread.read()))
    fileread.close()
    print('session loaded from file')
else:
    print('new session started')

session.add_dht_router("router.utorrent.com", 6881)
session.add_dht_router("router.bittorrent.com", 6881)
session.add_dht_router("dht.transmissionbt.com", 6881)
session.add_dht_router("router.bitcomet.com", 6881)
session.add_dht_router("dht.aelitis.com", 6881)
session.start_dht()

alerts = [] 

alive = True
while alive:
    a = session.pop_alert()
    alerts.append(a)
    print('----------')
    for a in alerts:
        var_dump(a)
        alerts.remove(a)


    print('sleep10')
    sleep(10)
    filewrite = open(session_save_filename, "wb")
    filewrite.write(lt.bencode(session.save_state()))
    filewrite.close()

kept it running for a minute and received alert

obj.msg = no router found 

update :

after some testing looks like

session.add_dht_router("router.bitcomet.com", 6881)

causing

('%s: %s', 'alert', 'DHT error [hostname lookup] (1) Host not found (authoritative)')

update: i added

session.start_dht()
session.start_lsd()
session.start_upnp()
session.start_natpmp()

and got alert

('%s: %s', 'portmap_error_alert', 'could not map port using UPnP: no router found')

Solution

  • as MatteoItalia points out, bootstrapping the DHT is not instantaneous, and can sometimes take a while. There's no well defined point in time when the bootstrap process is complete, it's a continuum of being increasingly connected to the network.

    The more connected and the more good, stable nodes you know about, the quicker lookups will be. One way to factor out most of the bootstrapping process (to get more of an apples-to-apples comparison) would be to start timing after you get the dht_bootstrap_alert (and also hold off on adding the magnet link until then).

    Adding the dht bootstrap nodes will primarily make it possible to bootstrap, it's still not necessarily going to be particularly quick. You typically want about 270 nodes or so (including replacement nodes) to be considered bootstrapped.

    One thing you can do to speed up the bootstrap process is to make sure you save and load the session state, which includes the dht routing table. This will reload all the nodes from the previous session into the routing table and (assuming you haven't changed IP and that everything works correctly) bootstrapping should be quicker.

    Make sure you don't start the DHT in the session constructor (as the flags argument, just pass in add_default_plugins), load the state, add the router nodes and then start the dht.

    Unfortunately there are a lot of moving parts involved in getting this to work internally, the ordering is important and there may be subtle issues.

    Also, note that keeping the DHT running continuously would be quicker, as reloading the state still will go through a bootstrap, it will just have more nodes up front to ping and try to "connect" to.

    disabling the start_default_features flag also means UPnP and NAT-PMP won't be started, if you use those, you'll have to start those manually as well.