I am currently working on trying to take a packet capture and work backwards to determine what objects are associated to each page request. For example if a packet capture contains 2 different webpages worth of requests I want to be able to determine for each object (TCP stream) which root page it is associated with. Is there an easy way to do this?
I know there are tools that will isolate the TCP streams and which will pull the data within them out, however I am not looking to replicate the webpage. I am simply looking to be able to associate each stream to the original page that requested it.
What you are trying to do is reconstructing the "call graph" of a browsing session. For a simply analysis, you can inspect just the HTTP headers. Bro makes this process very convenient. If site A loads site B, A typically shows up in the Referer
header of B.
However, if you aim for completeness, this task becomes a daunting challenge: you need to parse the HTTP body payload and even JavaScript to determine all the URLs that are being created at runtime in the client, e.g., via AJAX, iframes, and friends.