Search code examples
iosobjective-cobjective-c-runtimeobjective-c-2.0

Objective-C ISA pointer on disk vs when object being instantiated


The Objective-C runtime ISA pointer is defined as such:

union isa_t {
    isa_t() { }
    isa_t(uintptr_t value) : bits(value) { }

    uintptr_t bits;

private:
    // Accessing the class requires custom ptrauth operations, so
    // force clients to go through setClass/getClass by making this
    // private.
    Class cls;

public:
#if defined(ISA_BITFIELD)
    struct {
        ISA_BITFIELD;  // defined in isa.h
    };

    bool isDeallocating() {
        return extra_rc == 0 && has_sidetable_rc == 0;
    }
    void setDeallocating() {
        extra_rc = 0;
        has_sidetable_rc = 0;
    }
#endif

    void setClass(Class cls, objc_object *obj);
    Class getClass(bool authenticated);
    Class getDecodedClass(bool authenticated);
};

The bits fields can be read by the definitions here.

When I read a macho from disk and go to the _objc_classlist section and follow a objc_class which is defined as such:


struct objc_class : objc_object {
  objc_class(const objc_class&) = delete;
  objc_class(objc_class&&) = delete;
  void operator=(const objc_class&) = delete;
  void operator=(objc_class&&) = delete;
    // Class ISA;
    Class superclass;
    cache_t cache;             // formerly cache pointer and vtable
    class_data_bits_t bits;    // class_rw_t * plus custom rr/alloc flags
    ...

and objc_object is defined as such:

struct objc_object {
private:
    isa_t isa;

public:
    ...

meaning that I should be able to interpret the first 8 bytes of objc_class as the bits field of an isa, but when I do this and try to interpret the bits I get random and false information, on the other hand if I interpret the first 8 bytes as a pointer, it leads me to another objc_class instance on disk, which is usually the metaclass of the class. I wonder then why is the definition of the isa union from the Objective-C runtime and its bits field. Is this only right to interpret this as isa union with bits when we instantiate an object of a some kind and when reading from disk it's just a pointer to a meta class definition?

EDIT:
The way I read the objc_class struct from file is with python:

ISA_MASK = 0x0000000ffffffff8

@dataclass
class Isa():
    bits: ctypes.c_size_t
    _cls: ctypes.c_size_t

    def __init__(self, fp, addr):
        fp.seek(addr)
        self.bits = struct.unpack("<Q", fp.read(8))[0]
        self._cls = self.bits

    def nonpointer(self):
        return self.bits & 1
    
    def has_assoc(self):
        return (self.bits >> 1) & 1
    
    def has_cxx_dtor(self):
        return (self.bits >> 2) & 1
    
    def shiftcls(self):
        return (self.bits >> 3) & 0x7ffffffff
    
    def magic(self):
        return (self.bits >> 36) & 0x3f
    
    def weakly_referenced(self):
        return (self.bits >> 42) & 1
    
    def unused(self):
        return (self.bits >> 43) & 1

    def has_sidetable_rc(self):
        return (self.bits >> 44) & 1

    def extra_rc(self):
        return (self.bits >> 45) & 0x7ffff

    def get_class(self):
        clsbits = self.bits
        clsbits &= ISA_MASK
        return clsbits


@dataclass
class ObjcObject:
    isa: Isa
    _addr: ctypes.c_size_t

    def __init__(self, fp, addr, isa_class, external_block_addr):
        self.isa = None
        self._addr = addr

        fp.seek(addr)

        isa_addr = struct.unpack("<Q", fp.read(8))[0]
        if isa_addr != 0 and isa_addr < external_block_addr:
            self.isa = Isa(fp, isa_addr, external_block_addr)
@dataclass
class ObjcClass(ObjcObject):
    super_class: ObjcClass
    cache: Cache
    class_ro: ClassRo

    def __init__(self, fp, addr, external_block_addr):
        super().__init__(fp, addr, ObjcClass, external_block_addr)
        ...
        ...

I have for example a class lets call it A and after processing the chained fixups on address 0x0025eed0 I have it it's symbol _OBJC_CLASS_$_A and the objc_class defined in that addres.

The first 8 bytes of the structure is the ISA as we've established by looking at the sources of the runtime. Following it as a pointer and not treating it as the isa_t union I get to another objc_class struct for the symbol _OBJC_METACLASS_$_A which is the metaclass of this class.

Now if instead of treating the first 8 bytes of the objc_class struct as a pointer to the metaclass, I try to interpret them as the bits of the isa_t union like I have in the code I provided, and for example using the has_cxx_dtor method I get False which is incorrect because I can clearly find this method on the method_list_t structure of the class_ro so it doesn't match up with what I parse and hence the isa_t union seem unrelated to the actual data of the class on disk.

Note that the method for extracting the data from the bits of isa_t is by looking at the source of isa.h and assuming I read an ARM64 macho without ptr auth and not from simulator.


Solution

  • After digging a bit through the runtime, it appears that non-pointer isas are a runtime-only concept, and that all on-disk isas will always be regular pointers.

    The loading process of Obj-C classes in an object file:

    1. dyld calls _objc_map_images (objc-internal.h/objc-runtime-new.mm), passing in the object headers to read and load classes from
    2. _objc_map_images does a bit of setup before calling map_images (objc-private.h/objc-runtime-new.mm)
    3. map_images takes the runtime lock, then calls map_images_nolock (objc-private.h/objc-os.mm)
    4. map_images_nolock iterates over the mach headers, searching for Obj-C info and performing some validation. It passes all of the headers which contain Obj-C classes to _read_images (objc-private.h/objc-runtime-new.mm)
    5. _read_images is where we actually get to the interesting parts. It first sets up support for non-pointer isas as relevant for the runtime target, and sets up some tables for storing class information. After reading and fixing up selectors, it starts reading class info (OBJC_RUNTIME_DISCOVER_CLASSES_START())
      • For each header, it iterates over the raw classlist stored in the header, receiving direct pointers to each of the classes in the image
      • For each class read this way, it calls readClass (objc-runtime-new.mm), which resolves mangled class names, Swift classes, and more — but at the end of the day, the read classref_t (raw pointer to dyld class) is either cast to Class (the class object), or replaced by an allocated Class instance

    So, where do non-pointer isas come into play? Only when setting objects' class at runtime:

    1. When you create an object via either objc_constructInstance or class_createInstance (runtime.h), or set an object's class via object_setClass, the object has either objc_object::initInstanceIsa or objc_object::initIsa (objc-object.h) called on it (and initInstanceIsa just calls through to initIsa anyway)
    2. objc_object::initIsa has two implementations (one for SUPPORT_NONPOINTER_ISA and the other for non-supported), but both call down to isa_t::setClass (objc-private.h/objc-object.h)
    3. isa_t::setClass also has two implementations — when SUPPORT_NONPOINTER_ISA is true, the implementation sets the appropriate bits in the isa value itself, setting shiftcls as necessary; when SUPPORT_NONPOINTER_ISA is false, it just sets the class directly

    (Or in reverse, if you prefer: isa_t::setClass is only called from objc_object::initIsa/objc_object::changeIsa, which themselves are only called from objc_constructInstance/class_createInstance/object_setClass.)

    So, when you read these object files on disk, you will only ever encounter pointer isas for objects and classes; the bits that are actually set inside of isas is done at runtime exclusively. If there are details you're hoping to read from those bits, you'll need to construct that info yourself from the surrounding mach-o data.