Search code examples
hexstatic-analysisyara

Why does it seem to be that several different hexadecimal numbers represented as the dot (".") symbol?


I noticed that the symbol . doesn't represent the same hexadecimal number when I tried to tune my YARA rules that I run on VirusTotal. When I tried to exclude the false positive-generating text string .sample., it would not get excluded because . converted from text representation was 2E in this case, meanwhile in the string, that was actually contained in the false positives, . represented 00.

I assume that when the files are matched, text is converted to hex, the hex string is then matched in a hexdump of a file and the whole hexdump is converted to text in the VT preview.

Then I noticed that there were actually more hexadecimal numbers that were represented as . in VirusTotal's text preview. For example, 0A, 99, 09 (screenshot).

enter image description here

I tried seeing the text representation of these hex numbers using an online converter (http://www.unit-conversion.info/texttools/hexadecimal/) and some of them were represented as or a blank symbol (not a space symbol, as the number 20, but just a blank space).

So my questions are - why do different numbers seem to represent the same symbol? In addition, what do the "blank spaces" represent in a file's hexdump?


Solution

  • The 0A characters are line feed characters, as can be seen from the tables in here, while the 2E characters are actual periods.

    As per this answer on the same issue:

    These are whitespace characters, and if included literally would mess up the ASCII table. That's why they (as well as the unprintable control characters below 32, and any binary values above 127, which aren't defined by ASCII and would need another character set to be interpreted correctly) are represented by .

    Essentially, the '.' character is a catch-all for things which can't be shown properly in the table.

    As for the online converter, it appears to generate characters until 7F, after which ASCII's 128 bit implementation is no longer defined and the translator provides a � symbol. Even from 00 to 7F we find the translator has issues with a few hex values including the line feed character 0A.

    The ASCII table linked earlier hints at a few characters which the translator might have trouble with, such the DEL character (7F), the bell (07), and ENQ (05).

    I would expect that blank spaces are whitespace characters, this should be possible to verify in the ASCII table.