Search code examples
mysqlbittorrent

how to capture bitorrent infohash id in network using tcpdump or any other open scource tool?


i am working on a project where we need to collect the bitorrent infohash id running in our small ISP network. using port mirroring we can pass the all wan traffic to a server and run tcpdump tools or any other tool to find the infohash id download by bitorrent client. for example

tcpflow -p -c -i eth1 tcp | grep -oE '(GET) .* HTTP/1.[01].*'

this code is showing result like this

GET /announce?info_hash=N%a1%94%17%2c%11%aa%90%9c%0a%1a0%9d%b2%cfy%08A%03%16&peer_id=-BT7950-%f1%a2%d8%8fO%d7%f9%bc%f1%28%15%26&port=19211&uploaded=55918592&downloaded=0&left=0&corrupt=0&key=21594C0B&numwant=200&compact=1&no_peer_id=1 HTTP/1.1

now we need to capture only infohash and store it to a log or mysql database

can you please tell me which tool can do thing like this


Solution

  • Depending on how rigorous you want to be you'll have to decode the following protocol layers:

    1. TCP, assemble packets of a flow. you're already doing that with tcpflow. tshark - wireshark's CLI - could do that too.
    2. HTTP, extract the value of the GET header. A simple regex would do the job here.
    3. URI, extracting the query string
    4. application/x-www-form-urlencoded, info_hash key value pair extraction and handling of percent-encoding

    For the last two steps I would look for tools or libraries in your programming language of choice to handle them.