I asked this question few days ago.
I wanted to get the stack allocation size (after the function creation). The answer suggests to do:
if((INS_Opcode(ins) == XED_ICLASS_ADD || INS_Opcode(ins) == XED_ICLASS_SUB) && REG(INS_OperandReg(ins, 0)) == REG_STACK_PTR && INS_OperandIsImmediate(ins, 1)
Which in theory is correct and does make sense. But, it doesn't work in practice (correct me if I'm wrong here). It works perfectly fine if I remove REG(INS_OperandReg(ins, 0)) == REG_STACK_PTR
check. Why? Because pin doesn't detect the REG_STACK_PTR
register when REG(INS_OperandReg(ins, 0))
is used to detect it. rather, it detects ah
(which I believe is RAX), when I do check against add rsp, 0xffffffffffffff80
instruction (so, every time it gives: register: ah
), as can be seen in my output below:
in
register: rbp
40051e push rbp
register: *invalid*
value: -128
40051f mov rbp, rsp
register: ah
400522 add rsp, 0xffffffffffffff80
register: *invalid*
400526 mov dword ptr [rbp-0x28], 0x7
register: *invalid*
40052d mov dword ptr [rbp-0x64], 0x9
register: eax
400534 mov eax, 0x0
register: *invalid*
400539 call 0x4004e6
register: rbp
4004e6 push rbp
register: *invalid*
value: 64
4004e7 mov rbp, rsp
register: ah
4004ea sub rsp, 0x40
register: *invalid*
4004ee mov dword ptr [rbp-0xc], 0x4
register: rax
4004f5 lea rax, ptr [rbp-0xc]
register: *invalid*
4004f9 mov qword ptr [rbp-0x8], rax
register: rax
4004fd mov rax, qword ptr [rbp-0x8]
register: eax
400501 mov eax, dword ptr [rax]
register: *invalid*
400503 mov esi, eax
register: edi
400505 mov edi, 0x4005d0
register: eax
40050a mov eax, 0x0
register: rdi
40050f call 0x4003f0
register: rdi
4003f0 jmp qword ptr [rip+0x200c22]
register: *invalid*
4003f6 push 0x0
register: *invalid*
4003fb jmp 0x4003e0
register: *invalid*
4003e0 push qword ptr [rip+0x200c22]
register: rdi
4003e6 jmp qword ptr [rip+0x200c24]
4
register: *invalid*
400514 mov dword ptr [rbp-0x3c], 0x3
40051b nop
register: *invalid*
40051c leave
register: *invalid*
40051d ret
register: eax
40053e mov eax, 0x0
register: *invalid*
400543 leave
out
Well, interestingly it does this for every occurrences of rsp
(i.e. it detects ah
instead of rsp
). Also, it always prints the instruction 400522 add rsp, 0xffffffffffffff80
, including rsp
(So, why it doesn't print ah
here?)
If ah
represents rsp
in some way, then I can always detect ah
using: REG(INS_OperandReg(ins, 0)) == REG_AH
. But, I want to understand what is going on here.
My code:
#include <iostream>
#include <fstream>
#include "pin.H"
#include <unordered_map>
// key to open the main Routine
static uint32_t key = 0;
// Ins object mapping
class Insr
{
private:
// Disassembled instruction
string insDis;
INS ins;
public:
Insr(string insDis, INS ins) { this->insDis = insDis; this->ins = ins;}
string get_insDis() { return insDis;}
INS get_ins() { return ins;}
};
// Stack for the Insr structure
static std::unordered_map<ADDRINT, Insr*> insstack;
// This function is called before every instruction is executed
VOID protect(uint64_t addr)
{
if (addr > 0x700000000000)
return;
if (!key)
return;
// Initialize the diassembled instruction
string insdis = insstack[addr]->get_insDis();
INS ins = insstack[addr]->get_ins();
if (INS_OperandCount(ins) > 0)
{
if (REG(INS_OperandReg(ins, 0)) == REG_AH)
std::cout << "register: " << REG_StringShort(REG(INS_OperandReg(ins, 0))) << '\n';
}
if((INS_Opcode(ins) == XED_ICLASS_ADD || INS_Opcode(ins) == XED_ICLASS_SUB) &&
INS_OperandIsImmediate(ins, 1))
{
int value = INS_OperandImmediate(ins, 1);
std::cout << "value: " << dec<<value << '\n';
}
std::cout << hex <<addr << "\t" << insdis << std::endl;
}
// Pin calls this function every time a new instruction is encountered
VOID Instruction(INS ins, VOID *v)
{
if (INS_Address(ins) > 0x700000000000)
return;
insstack.insert(std::make_pair(INS_Address(ins), new Insr(string(INS_Disassemble(ins)),
ins)));
// if (REG_valid_for_iarg_reg_value(INS_MemoryIndexReg(ins)))
// std::cout << "true" << '\n';
// Insert a call to docount before every instruction, no arguments are passed
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)protect, IARG_ADDRINT, INS_Address(ins),
IARG_END);
}
// Lock Routine
void mutex_lock()
{
key = 0;
std::cout<<"out\n";
}
void mutex_unlock()
{
key = 1;
std::cout<<"in\n";
}
void Routine(RTN rtn, VOID *V)
{
if (RTN_Name(rtn) == "main")
{
RTN_Open(rtn);
RTN_InsertCall(rtn, IPOINT_BEFORE, (AFUNPTR)mutex_unlock, IARG_END);
RTN_InsertCall(rtn, IPOINT_AFTER, (AFUNPTR)mutex_lock, IARG_END);
RTN_Close(rtn);
}
}
INT32 Usage()
{
cerr << "This tool counts the number of dynamic instructions executed" << endl;
cerr << endl << KNOB_BASE::StringKnobSummary() << endl;
return -1;
}
int main(int argc, char * argv[])
{
// Initialize the symbol table
PIN_InitSymbols();
// Initialize pin
if (PIN_Init(argc, argv)) return Usage();
PIN_SetSyntaxIntel();
// Routine instrumentation
RTN_AddInstrumentFunction(Routine, 0);
// Register Instruction to be called to instrument instructions
INS_AddInstrumentFunction(Instruction, 0);
// Start the program, never returns
PIN_StartProgram();
return 0;
}
I have few questions regarding that.
How can I understand such a behavior? And how can I detect rsp if I want to? Lastly, how does the instruction prints rsp
, but REG(INS_OperandReg(ins, 0)) == REG_STACK_PTR
can not detect it?
The INS
objects are only valid inside instrumentation routines, such as your Instruction
routine. The INS
type is nothing but a 32-bit integer that identifies an instruction. The Pin runtime internally maintains a table that maps these 32-bit integers to specific static instructions. It creates such a table whenever it's about to call an instrumentation routine. When the instrumentation routine returns, there is no guarantee that any of these identifiers map to the same static instructions and they may not even be valid. So when you save a copy of an INS
object in the following line of code:
insstack.insert(std::make_pair(INS_Address(ins), new Insr(string(INS_Disassemble(ins)),
ins)));
that copy is only useful in the same instance of the Instruction
routine. The next time the Instruction
routine is called (or any other instrumentation routine), an instruction identifier might be reused for other instructions.
If you really want to pass an instruction to an analysis routine, you have two options: