I've been playing around with scapy and want to read through and analyse every hex byte. So far I've been using scapy simply because I don't know another way currently. Before just writing tools myself to go through the pcap files I was wondering if there was an easy way to do it. Here's what I've done so far.
packets = rdpcap('file.pcap')
tcpPackets = []
for packet in packets:
if packet.haslayer(TCP):
tcpPackets.append(packet)
When I run type(tcpPackets[0])
the type I get is:
<class 'scapy.layers.l2.Ether'>
Then when I try to covert the Ether object into a string it gives me a mix of hex and ascii (as noted by the random parenthesis and brackets).
str(tcpPackets[0])
"b'$\\xa2\\xe1\\xe6\\xee\\x9b(\\xcf\\xe9!\\x14\\x8f\\x08\\x00E\\x00\\x00[:\\xc6@\\x00@\\x06\\x0f\\xb9\\n\\x00\\x01\\x04\\xc6)\\x1e\\xf1\\xc0\\xaf\\x07[\\xc1\\xe1\\xff0y<\\x11\\xe3\\x80\\x18 1(\\xb8\\x00\\x00\\x01\\x01\\x08\\n8!\\xd1\\x888\\xac\\xc2\\x9c\\x10%\\x00\\x06MQIsdp\\x03\\x02\\x00\\x05\\x00\\x17paho/34AAE54A75D839566E'"
I have also tried using hexdump but I can't find a way to parse through it.
I can't find the proper dupe now, but this is just a miss-use/miss-understanding of str()
. The original data is in a bytes format, for instance x = b'moo'
.
When str()
retrieves your bytes string, it will do so by calling the __str__
function of the bytes
class/object. That will return a representation of itself. The representation will keep b
at the beginning because it's believed to distinguish and make it easier for humans to understand that it's a bytes object, as well as avoid encoding issues I guess (alltho that's speculations).
Same as if you tried accessing tcpPackets[0]
from a terminal, it would call __repr__
and show you something like <class 'scapy.layers.l2.Ether'>
most likely.
As an example code you can experiment with, try this out:
class YourEther(bytes):
def __str__(self):
return '<Made Up Representation>'
print(YourEther())
Obviously scapy's returns another representation, not just a static string that says "made up representation". But you probably get the idea.
So in the case of <class 'scapy.layers.l2.Ether'>
it's __repr__
or __str__
function probably returns b'$\\xa2\\.......
instead of just it's default class representation (some correction here might be in place tho as I don't remember/know all the technical namification of the behaviors).
As a workaround, this might fix your issue:
hexlify(str(tcpPackets[0]))
All tho you probably have to account for the prepended b'
as well as trailing '
and remove those accordingly. (Note: "
won't be added in the beginning or end, those are just a second representation in your console when printing. They're not actually there in terms of data)
Scapy is probably more intended to use tcpPackets[0].dst
rather than grabing the raw data. But I've got very little experience with Scapy, but it's an abstraction layer for a reason and it's probably hiding the raw data or it's in the core docs some where which I can't find right now.
More info on the __str__
description: Does python `str()` function call `__str__()` function of a class?
Last note, and that is if you actually want to access the raw data, it seams like you can access it with the Raw class: Raw load found, how to access?