Search code examples
androiddalvikdex

Android Dex file format issue with ULEB128 and class_def_item


I'm exploring the Android file format and I'm running into something that doesn't look right but also isn't mentioned in the documentation. I've got a sample app that I was tracing through just to pull out all the fields and etc.

I started tracing through the format and everything was fine until I got to the class_data_item. I was looking for the direct_methods declaration but the size fields at the beginning don't seem to make any sense.

I've confirmed I've started at the proper class_def_item (i.e. the name and everything else is correct) but when I go to the RVA (should be offset 0x18) for class_data_item, the bytes I get are as follows (apologies for formatting):

0D 00 0B 00 22 19 01 19 01 19 01 19 01 19 01 19 01 19 01 19 01 19 01 19 01 19 01 19 01 1A 94 03 88 80 04 DC E1 08 01 82 80 04 B4

The spec says that the first four fields should be sizes for fields and methods, but no matter how I interpret these bytes, it's invalid. 0xb000d, 0x19011922, etc are too large to make any sense in context. I'm in the data section of the dex file.

I've also seen an instance (with strings in the string_id section) where an item labeled leb128 was only a single byte. I know leb is supposed to be variable length but no matter how I interpret that it doesn't look like it makes any sense - it either points somewhere that isn't an encoded_method or into gibberish. Also the repeating 01 19 values don't look right.

Now the example class/methods I'm tracing is an Android component - android/support/v4/accessibilityservice/AccessibilityServiceInfoCompat but I don't think that should make a difference correct?

I'm not sure what else could be going wrong here.


Solution

  • Here is an annotated dump of the binary snippet you supplied, assuming that it is a class_data_item

    000000: 0d                 |static_fields_size = 13
    000001: 00                 |instance_fields_size = 0
    000002: 0b                 |direct_methods_size = 11
    000003: 00                 |virtual_methods_size = 0
                               |static_fields:
                               |  static_field[0]
    000004: 22                 |    field_idx_diff = 34
    000005: 19                 |    access_flags = 0x19
                               |  static_field[1]
    000006: 01                 |    field_idx_diff = 1
    000007: 19                 |    access_flags = 0x19
                               |  static_field[2]
    000008: 01                 |    field_idx_diff = 1
    000009: 19                 |    access_flags = 0x19
                               |  static_field[3]
    00000a: 01                 |    field_idx_diff = 1
    00000b: 19                 |    access_flags = 0x19
                               |  static_field[4]
    00000c: 01                 |    field_idx_diff = 1
    00000d: 19                 |    access_flags = 0x19
                               |  static_field[5]
    00000e: 01                 |    field_idx_diff = 1
    00000f: 19                 |    access_flags = 0x19
                               |  static_field[6]
    000010: 01                 |    field_idx_diff = 1
    000011: 19                 |    access_flags = 0x19
                               |  static_field[7]
    000012: 01                 |    field_idx_diff = 1
    000013: 19                 |    access_flags = 0x19
                               |  static_field[8]
    000014: 01                 |    field_idx_diff = 1
    000015: 19                 |    access_flags = 0x19
                               |  static_field[9]
    000016: 01                 |    field_idx_diff = 1
    000017: 19                 |    access_flags = 0x19
                               |  static_field[10]
    000018: 01                 |    field_idx_diff = 1
    000019: 19                 |    access_flags = 0x19
                               |  static_field[11]
    00001a: 01                 |    field_idx_diff = 1
    00001b: 19                 |    access_flags = 0x19
                               |  static_field[12]
    00001c: 01                 |    field_idx_diff = 1
    00001d: 1a                 |    access_flags = 0x1a
                               |direct_methods:
                               |  direct_method[0]
    00001e: 9403               |    method_idx_diff = 404
    000020: 8880 04            |    access_flags = 0x10008: static|constructor
    000023: dce1 08            |    code_off = code_item[0x230dc]
                               |  direct_method[1]
    000026: 01                 |    method_idx_diff = 1
    000027: 8280 04            |    access_flags = 0x10002: private|constructor
    00002a: b4                 |
    

    I think the part you're getting hung up on is the encoding of a uleb128, which is described in detail in the dex-format document. In particular, note that the most significant bit is set in every byte of a uleb128, except the last.

    For example, look at the first byte of the binary data you provided - 0x0d. You know this is the beginning of a uleb128. You also know that this is the last byte of the uleb128 because its high bit isn't set.

    The repeating 01 19 values make perfect since. As you can see in the annotated dump, 0x19 is the access flags for each field, and 0x01 represents the difference in the field id from previous. So the field id for the first field is 0x22, for the second is 0x22+1=0x23, for the third is 0x23+1=0x24, and so on.

    I recommend baksmali's annotated dump functionality if you want clarification about a certain structure in a dex file. You can use it with the -D option. For example,

    baksmali -D blah.dump blah.dex
    

    You'll get the sort of annotated dump as I gave above, but for the whole dex file.