Search code examples
pythonparsingpacket

Trying to parse packets in Python and struggling to build a proper script


I have a huge (like 600MB) file of packet captures in the following format:

[S->C][02][0x5A97BE]
[0047F32B] 95 DE 5E 52 4A F3 80 F5 47 18 97 70 10 EE 5B E5  ..^RJ...G..p..[.
           7C E8 F5 B2 2F 1F 3A 6B A1 8F 6C 73 65 A6 42 27  |.../.:k..lse.B'

My goal is to reduce this entire thing so it looks like this:

[S->C][02][0x5A97BE]
95DE5E524AF380F54718977010EE5BE57CE8F5B22F1F3A6BA18F6C7365A64227

I looked through some string.split() and regex tutorials but none of them seemed to quite get me there. Basically I am trying to make an if statement checking if the line contains [S->C] or has a ">" in the fourth character spot. If so, skip line. Then I want to remove everything to the left of the first space and everything to the right of the double space (there's a double space between the hex and ascii display.)

I have followed like 3 tutorials now, modified them, and just can't get it to parse correctly. Any help would be amazing. I know it isn't too hard but for some reason it's eluding me.


Solution

  • Here's one way. Pass the input file as myfun(input_file)

    def myfun(in_file):
        header = True
        # Get the meat, remove newlines.
        reg1 = re.compile(r'^.{11}(.{47}).*\n')
        # Remove spaces.
        reg2 = re.compile(r' ')
        with open(in_file) as f:
            for line in f:
                if header:
                    # Print the header.
                    print(line, end='')
                    header = False
                else:
                    # Print the body.
                    print(reg2.sub('', reg1.sub('\g<1>', line)), end='')
        # Append a newline.
        print()
        return