This question is for Intel x86 assembly experts to answer. Thanks for your effort in advance!
I am analysing a binary file, which match Mach-O 64-bit x86 assembly. I am currently using MacOS 64 OS. The assembly comes from objdump.
The problem is that when I am learning assembly, I can see variable name "$xxx", I can see string value in ascii and I can also see the callee name like "call _printf"
But in this assembly, I can get nothing above:
no main function:
Disassembly of section __TEXT,__text:
__text:
100000c90: 55 pushq %rbp
100000c91: 48 89 e5 movq %rsp, %rbp
100000c94: 48 83 ec 10 subq $16, %rsp
100000c98: 48 8d 3d bf 02 00 00 leaq 703(%rip), %rdi
100000c9f: b0 00 movb $0, %al
100000ca1: e8 68 02 00 00 callq 616
100000ca6: 89 45 fc movl %eax, -4(%rbp)
100000ca9: 48 83 c4 10 addq $16, %rsp
100000cad: 5d popq %rbp
100000cae: c3 retq
100000caf: 90 nop
100000cb0: 55 pushq %rbp
...
The above is codes frame will be executed, but I have no idea where it is executed.
Also, I newbie of AT&T assemble. Hence, could you tell me what is the meaning of instruction:
0000000100000c90 pushq %rbp
0000000100000c98 leaq 0x2bf(%rip), %rdi ## literal pool for: "xxxx\n"
...
0000000100000cd0 callq 0x100000c90
Is it a loop? I am not sure but it seems to be. And why we they use %rip and %rdi register. In intel x86 I know that EIP represents current caller address, but I don't understand the meaning here.
call integer: No matter what call convention they used, I had never seen code pattern like "call 616":
"100000cd0: e8 bb ff ff ff callq -69 <__mh_execute_header+C90>"
After ret: Ret in intel x86, means delete stack frame and return control flow to caller. It should be an independent function. However, after this, we can see codes like
100000cae: c3 retq
100000caf: 90 nop
/* new function call */
100000cb0: 55 pushq %rbp
...
It is ridiculous!
ASCII string lost: I have already viewed the binary in Hexadecimal format, and recognise some ascii string before reverse it to asm file.
However, in this file no ascii string occurrences!
Total architecture review:
Disassembly of section __TEXT,__text:
__text:
from address 10000c90 to 100000ef6 of 145 lines
Disassembly of section __TEXT,__stubs:
__stubs:
from address 100000efc to 100000f14 of 5 lines asm codes:
100000efc: ff 25 16 01 00 00 jmp qword ptr [rip + 278]
100000f02: ff 25 18 01 00 00 jmp qword ptr [rip + 280]
100000f08: ff 25 1a 01 00 00 jmp qword ptr [rip + 282]
100000f0e: ff 25 1c 01 00 00 jmp qword ptr [rip + 284]
100000f14: ff 25 1e 01 00 00 jmp qword ptr [rip + 286]
Disassembly of section __TEXT,__stub_helper:
__stub_helper:
...
Disassembly of section __TEXT,__cstring:
__cstring:
...
Disassembly of section __TEXT,__unwind_info:
__unwind_info:
...
Disassembly of section __DATA,__nl_symbol_ptr:
__nl_symbol_ptr:
...
Disassembly of section __DATA,__got:
__got:
...
Disassembly of section __DATA,__la_symbol_ptr:
__la_symbol_ptr:
...
Disassembly of section __DATA,__data:
__data:
...
Since it might be a virus, I cannot execute it. How should I analyse it ?
I have already identified where is the output, and if I totally understand the data flow pipeline represented in this programme, I might be able to figure out the possible solutions.
I am appreciated if someone can give me the detailed explanation. Thank you !
I installed a MacOS in VirtualBox and after chmod privileges , I executed the programme but nothing special except for two lines of output happened. And the result hiding in the binary file.
main
if you are not using C. The binary header contains the entry point address.call 616
, it's just that you don't have (all) symbols. It's somewhat strange that objdump didn't calculate the address for you, but it should be 0x100000ca6+616
.