Search code examples
c#network-programmingreverse-engineeringdata-analysis

Identifying repeating sequences of data in byte array


Given a sample of hexadecimal data, I would like to identify UNKNOWN sequences of bytes that are repeated throughout the sample. (Not searching for a known string or value) I am attempting to reverse engineer a network protocol, and I am working on determining data structures within the packet. As an example of what I'm trying to do (albeit on a smaller scale):

(af:b6:ea:3d:83:02:00:00):{21:03:00:00}:[b3:49:96:23:01]

{21:03:00:00}:(af:b6:ea:3d:83:02:00:00):01:42:00:00:00:00:01:57

And

(38:64:88:6e:83:02:00:00):{26:03:00:00}:[b3:49:96:23:01]

{26:03:00:00}:(38:64:88:6e:83:02:00:00):01:42:00:00:00:00:00:01

Obviously, these are easy to spot by eye, but patterns that are hundreds of chars into the data are not. I'm not expecting a magic bullet for the solution, just a nudge in the right direction, or even better, a premade tool.

I'm currently needing this for a C# project, but I am open to any and all tools.


Solution

  • If you have no idea what you are looking for, you could get an idea of the layout of the data by performing a negative entropy analysis on a reasonably large enough sample of conversations to see the length of the records/sub-records.

    If the data is structured with repeated sequences of roughly the same length and content type you should see clusters of values with nearly the same negative entropy around the length of the record and sub records.

    For example if you put a basic file with a lot of the same data through that, you should see values around the average record length with comparable negentropies (ex: if you use a CSV file with an average line length of 117 bytes, you might see 115, 116, 117 & 119 with the highest negentropy), and values around the most common field lengths with the same negentropy.

    You might do a byte occurence scan, to see which byte values are likely separators.

    There is a free hex editor with sources which does that for you (hexplorer, in the Crypto/Find Pattern menu). You may have to change the default font through Options to actually something in the UI.