I'm using the eicar.com file and playing around with reverse engineering tools. I'd like to be able to disassemble and reassemble this file. I get close but there are still a few problems that I cannot figure out.
This is the original eicar.com
ascii file.
X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
Using udcli udcli -noff -nohex eicar.com > stage1.asm
I end up with this x86 assembly
pop eax
xor eax, 0x2550214f
inc eax
inc ecx
push eax
pop ebx
xor al, 0x5c
push eax
pop edx
pop eax
xor eax, 0x5e502834
sub [edi], esi
inc ebx
inc ebx
sub [edi], esi
jge 0x40
inc ebp
dec ecx
inc ebx
inc ecx
push edx
sub eax, 0x4e415453
inc esp
inc ecx
push edx
inc esp
sub eax, 0x49544e41
push esi
dec ecx
push edx
push ebp
push ebx
sub eax, 0x54534554
sub eax, 0x454c4946
and [eax+ecx*2], esp
sub ecx, [eax+0x2a]
Finally, putting it back together with nasm
using this command, nasm stage1.asm -o stage2
I end up with...
fXf5O!P%f@fAfPf[4\fPfZfXf54(P^fg)7fCfCfg)7^O<8d>^R^@fEfIfCfAfRf- STANfDfAfRfDf-ANTIfVfIfRfUfSf-TESTf-FILEfg!$Hfg+H*
In this case I'm starting with an ASCII file and end up with a bin file that holds a lot of extra garbage.
What am I missing here? How do I end up with the original ASCII string and have the proper file type?
EDIT: Per @Ross Ridge's suggestion, he noted that I was disassembling a 16-bit file as a 32-bit one, this has successfully cleaned up the string but he file type however is still incorrectly output as binary.
First fix: udcli -16 -noff -nohex eicar.com > stage1.asm
to obtain proper output string.
Results in X5O!P%@AP[4\PZX54(P^)7CC)7^O<8d>"^@EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
Still a little garbage data not present in the original but very close.
In general you can't reassemble the output of a dissembler back into the exact the same binary file as the original. There is often more than one way to assemble a given assembly instruction into machine code. As far your ultimate goal of understanding the code you're trying to do this with it's also not very helpful. Even if you do get something that you can assemble back into the original code, it's extremely unlikely you'll get something you can modify and assemble into code that works.
To illustrate this I've provided my own "disassembly" of the eicar.com
file, one that allows it to be modified to a limited extent. You can modify the string it prints, so long as the message isn't too long and does't contain any dollar sign $
characters. You should be able to modify the string while still keeping the output consisting of only of printable ASCII characters, assuming you only put printable ASCII characters in the string.
BITS 16
ORG 0x100
ascii_shift EQU 0x097b
start:
pop ax
xor ax, 0x2000 | (skip - start + 0x100) | 0x000f
push ax
and ax, 0x4000 | (skip - start + 0x100)
push ax
pop bx
xor al, (msg - start) ^ (skip - start)
push ax
pop dx
pop ax
xor ax, (0x2000 | (skip - start + 0x100) | 0x000f) ^ ascii_shift
push ax
pop si
sub [bx], si
inc bx
inc bx
sub [bx], si
jnl skip
msg:
DB 'EICAR-STANDARD-ANTIVIRUS-TEST-FILE!'
DB '$'
%if ($ - msg) < 0x21
TIMES 0x21 - ($ - msg) DB '$'
%endif
skip:
DW 0x21cd + ascii_shift
DW 0x20cd + ascii_shift
%if skip - msg > 0x7e
%error 'msg too long'
%endif
I won't explain how the code works, but I'll give you one hint: MS-DOS pushes a 16-bit 0 value on the stack at the start execution of a .COM format executable.