i am working on a project where we need to collect the bitorrent infohash id running in our small ISP network. using port mirroring we can pass the all wan traffic to a server and run tcpdump tools or any other tool to find the infohash id download by bitorrent client. for example
tcpflow -p -c -i eth1 tcp | grep -oE '(GET) .* HTTP/1.[01].*'
this code is showing result like this
GET /announce?info_hash=N%a1%94%17%2c%11%aa%90%9c%0a%1a0%9d%b2%cfy%08A%03%16&peer_id=-BT7950-%f1%a2%d8%8fO%d7%f9%bc%f1%28%15%26&port=19211&uploaded=55918592&downloaded=0&left=0&corrupt=0&key=21594C0B&numwant=200&compact=1&no_peer_id=1 HTTP/1.1
now we need to capture only infohash and store it to a log or mysql database
can you please tell me which tool can do thing like this
Depending on how rigorous you want to be you'll have to decode the following protocol layers:
info_hash
key value pair extraction and handling of percent-encodingFor the last two steps I would look for tools or libraries in your programming language of choice to handle them.