Search code examples
phpbittorrenttrackertorrent

Best way to retrieve external torrent statistics in an active website


I'm making a bittorrent tracker/website similar to The Pirate Bay, Kickass.to, etc. It's necessary to retrieve torrent statistics (seeders, downloads) in both the index and torrent page. Example:

http://kat.cr/ubuntu-15-04-vivid-vervet-desktop-amd64-iso-final-t10550003.html
Seeders: 3442 Leechers: 148

If the torrent is using my tracker, it's easy to quickly retrieve the data for both pages. However, if the torrent is using a different tracker, I would need to scrape its statistics from said tracker (making requests to it), but that usually takes a few seconds for each torrent and obviously, I can't make the users wait that long to see the listing.

I made a script that scrapes the latest 90 torrents running in background, but I'm afraid that it's not enough. The website will grow up, and total torrents will probably be over 5000. I don't think scraping that many torrents in background will work.

How can I do this?


Solution

  • The following strategies to obtain statistics are available, listed in descending order of efficiency:

    1. full scrape via scrape interface - used to be common, less so today on large trackers due to the traffic it causes
    2. full scrape via custom export URLs - you'll have to ask the tracker admins. sometimes to documented on their websites
    3. UDP multi-scrape
    4. HTTP multi-scrape via /scrape?info_hash=A&info_hash=B&info_hash=C - some trackers support it, some don't.
    5. http single-scrape
    6. DHT scrape
    7. joining the swarm and measuring via PEX