Search code examples
iphoneobjective-creverse-engineeringidacydia-substrate

How can I get SEL (@selector()) from object file (Mach-o)? how SEL stored in Mach-o?


From objc sources we can see that SEL is defined as typedef struct objc_selector *SEL;

I have disassembly my dylib with idaq, and I did finde call of _MSHookMessageEx function, that is linked from libsubstrate.dylib

_MSHookMessageEx has following signature

void MSHookMessageEx(Class class, SEL selector, IMP replacement, IMP *result);

So we can assume that in source code there was something like @selector(someMethod:) as second parameter

in the data section of object file I can see all CFStrings that used in source code

enter image description here

but there is not any selector string here, so we can see that @selector() is not converted into the static CFString

I am very interested to find String representations of selector and class that passed into the _MSHookMessageEx function.

How can I get SEL (@selector()) from object file (Mach-o)? how SEL stored in Mach-o?

Thank you!

Update:

I did finde that there is some strings in ida method representation before calling methods

enter image description here

I guess there are selectors that passed in to functions. Am i right?


Solution

  • Selector names are stored in the __objc_methname section of the __TEXT segment:

    :; otool -v -s __TEXT __objc_methname /System/Library/Frameworks/AppKit.framework/AppKit | head
    /System/Library/Frameworks/AppKit.framework/AppKit:
    Contents of (__TEXT,__objc_methname) section
    0x000000000097cbd8  count
    0x000000000097cbde  countByEnumeratingWithState:objects:count:
    0x000000000097cc09  alloc
    0x000000000097cc0f  initWithObjects:count:
    0x000000000097cc26  release
    0x000000000097cc2e  autorelease
    0x000000000097cc3a  copy
    0x000000000097cc3f  timeIntervalSinceNow
    

    Pointers to selectors are stored in the __objc_selrefs section of the __DATA segment:

    :; otool -v -s __DATA __objc_selrefs /System/Library/Frameworks/AppKit.framework/AppKit | head
    /System/Library/Frameworks/AppKit.framework/AppKit:
    Contents of (__DATA,__objc_selrefs) section
    0x0000000000d77d80  __TEXT:__objc_methname:initWithObjects:count:
    0x0000000000d77d88  __TEXT:__objc_methname:copy
    0x0000000000d77d90  __TEXT:__objc_methname:timeIntervalSinceNow
    0x0000000000d77d98  __TEXT:__objc_methname:sharedAppleEventManager
    0x0000000000d77da0  __TEXT:__objc_methname:_prepareForDispatch
    0x0000000000d77da8  __TEXT:__objc_methname:_setLaunchTaskMaskBits:
    0x0000000000d77db0  __TEXT:__objc_methname:_disableSuddenTermination
    0x0000000000d77db8  __TEXT:__objc_methname:_appleEventActivationInProgress
    

    A SEL in source code is actually (currently) a pointer to the C string name of the selector. So if you write this:

    SEL s = @selector(initWithObjects:count:);
    

    Then s is effectively a char const *, and it points to the string initWithObjects:count:. Until recently, you could print the selector name by doing this:

    NSLog(@"selector is %s", (char *)s);
    

    However, Apple changed the compiler (as of Xcode 4.6 I believe) to disallow casting a SEL to a char *, so they may change the selector implementation in the future.

    Anyway, the tricky part is that the machine code loads the pointer from the __objc_selrefs section using PC-relative addressing. The PC is the “program counter”, which is the address of the currently-executing instruction. On x86 architectures it's usually called IP (instruction pointer) or EIP (extended IP).

    That's what's going on in the relevant instructions of your disassembly:

    1444    LDR R1, =(off_2038 - 0x145C)
            ...
    1454    LDR R1, (PC,R1)
    

    The pointer to the selector is loaded from the word at address 0x2038. But the constant 0x2038 doesn't actually appear in the machine code. Your disassembler has helpfully computed it for you, by analyzing the data flow of the program. The constant stored in that first LDR instruction is actually 0xBDC, because 0xBDC + 0x145C = 0x2038.

    You might wonder why it's using 0x145C when the second LDR instruction is at address 0x1454. When an ARM processor computes an address using PC-relative addressing, the value of PC is actually the address of the currently executing instruction plus 4 or plus 8 (depending on the processor mode). This is documented here (and probably other places).