Search code examples
cx86-64reverse-engineeringida

What Does the "Arguments" Field in the Functions Tab of IDA Represent?


I want to efficiently find specific functions in IDA Pro by using the "Arguments" value in the Functions tab. However, for the following code examples, which were obtained from IDA decompilation:

__int64 __fastcall sub_49FB60(__int64 a1)
{
  sub_484690(a1 + 208, 0LL);
  sub_484690(a1 + 112, 0LL);
  return sub_484690(a1 + 16, 0LL);
}
_QWORD *__fastcall sub_4A0050(_QWORD *a1, __int64 a2)
{
  sub_4A3CB0(a1);
  *a1 = off_979AD8;
  a1[13] = a2;
  return a1;
}
__int64 __fastcall sub_4D3D20(__int64 a1, __int64 a2, unsigned __int8 a3)
{
  __int64 v5; // rcx

  *(_QWORD *)a1 = a2;
  v5 = *(_QWORD *)(*(int *)(*(_QWORD *)a2 + 4LL) + a2 + 72);
  if ( v5 )
    (*(void (__fastcall **)(__int64))(*(_QWORD *)v5 + 8LL))(v5);
  *(_BYTE *)(a1 + 8) = sub_4D3BC0(*(_QWORD *)a1, a3);
  return a1;
}

All of these functions have an "Arguments" value of 00000010. Initially, I thought it might represent the sum of argument sizes, but the calculations do not match. What does the "Arguments" field actually represent?

【Functions Tap Image】

Additionally, what does the "Arguments" value represent for the code examples provided?

__int64 __fastcall sub_52AFD0(unsigned int **a1, unsinged int a2, __int64 a3)
{
    unsinged int v3;
    unsinged int v4;
    unsinged int v6;
    unsinged __int i;
    
    v6 =-1;
    for ( i = a2 / 8ui64; i; --i)
    {
      v3 = **a1 ^ v6;
      ++*a1;
      v4 = **a1;
      v6 = *(_DWORD *)(a3 + 4i64 * (v4 >> 24)));
    }
    return v6;
}

Solution

  • The value in the "Arguments" column seems to represent the space reserved for function arguments before the function stack frame (higher stack addresses). Therefore, it does not represent the total size of all function arguments, but only those passed through the stack. Thanks to Andrey Turkin for pointing this out in the comments below.

    For example, disassembling /bin/awk on my x86-64 Linux system with IDA Free 8.4 I see the following function:

    __int64 __fastcall sub_47EC80(
            __int64 a1,
            unsigned int *a2,
            int a3,
            unsigned int a4,
            unsigned int a5,
            int a6,
            unsigned int a7)
    {
        // ...
    }
    

    Its "Arguments" column displays 00000004. If I look at the disassembly I can see the following:

    .text:000000000047EC80 ; =============== S U B R O U T I N E =======================================
    .text:000000000047EC80
    .text:000000000047EC80
    .text:000000000047EC80 ; __int64 __fastcall sub_47EC80(__int64, unsigned int *, int, unsigned int, unsigned int, int, unsigned int)
    .text:000000000047EC80 sub_47EC80      proc near               ; CODE XREF: sub_47F350+2F↓p
    .text:000000000047EC80                                         ; sub_47F500+6C5↓p
    .text:000000000047EC80
    .text:000000000047EC80 var_D0          = dword ptr -0D0h
       ... a bunch more var_XX here ...
    .text:000000000047EC80 var_54          = dword ptr -54h
    .text:000000000047EC80 ptr             = qword ptr -50h
    .text:000000000047EC80 var_40          = qword ptr -40h
    .text:000000000047EC80 arg_0           = dword ptr  8 
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ arg passed through stack
    .text:000000000047EC80
    .text:000000000047EC80 ; __unwind {
    .text:000000000047EC80                 push    r15
    .text:000000000047EC82                 mov     r15, rdi
    .text:000000000047EC85                 push    r14
    .text:000000000047EC87                 push    r13
    .text:000000000047EC89                 push    r12
       ...
    

    Notice that the only variable allocated past the stack pointer (with a positive offset of 8) is arg_0. That corresponds to the a7 argument in the function signature. In this case the first 6 arguments are passed through the registers RDI, RSI, RDX, RCX, R8 and R9. The 7th is passed on the stack.

    In fact, if I take a look at the xrefs (press X on the function name) I can see this call:

    .text:000000000047FBAD                 mov     [rsp+0E8h+var_E0], r10
    .text:000000000047FBB2                 sub     rsp, 8
    .text:000000000047FBB6                 mov     edx, [r15+4]
    .text:000000000047FBBA                 mov     r9d, r13d
    .text:000000000047FBBD                 push    9
    .text:000000000047FBBF                 mov     r8d, ebp
    .text:000000000047FBC2                 mov     rdi, r14
    .text:000000000047FBC5                 call    sub_47EC80
    

    As you can see the 7th argument is passed with push 9. Even though the instruction push 9 pushes a 8-byte value on the stack, the function then only seems to access the value using 4-byte (dword) dereference operations (or in any case treat it as a 4-byte value). That seems to be why the "Arguments" column shows 00000004.

    Similarly, for this other function:

    __int64 __fastcall sub_44DC90(
            __int64 a1,
            const char *a2,
            int a3,
            __int64 a4,
            __int64 a5,
            void (__fastcall *a6)(_QWORD, _QWORD, _QWORD, _QWORD),
            __int64 a7,
            __int64 a8,
            char a9)
    {
        // ...
    }
    

    I can see the "Arguments" column shows 00000011 (17) because __int64 a7, __int64 a8 and char a9 are passed through the stack:

    ...
    mov     rdi, 7FFFFFFFFFFFFFFFh
    mov     [rsp+60h+var_48], rax
    push    0
    push    r12
    push    rbp
    call    r15
    ...
    

    Looking into IDA Documentation

    IDA has a pretty nice built-in documentation. If you position your cursor in the function window, and then press the F1 key, the documentation will pop up in a new window. Here's what it looks on my IDA Free 8.4.

    Looking at the "Functions window" page in the documentation, the fields are:

    • function name
    • segment that contains the function
    • offset of the function within the segment
    • function length in bytes
    • size (in bytes) of local variables + saved registers
    • size (in bytes) of arguments passed to the function

    The same documentation page is also available online here.

    From the doc, it seems like the "Arguments" column should represent the "size of the arguments passed to the function". The Hex-Rays YouTube channel also has a video about the Functions window explaining things, and it also says that the "Arguments" column should represent the "size of arguments passed to the function".

    These descriptions seem too generic and don't accurately describe the meaning of the column. It seems like the documentation page should be updated with clearer descriptions.

    The same reasoning goes for the 2nd and 3rd columns (Segment and Start): the "Start" column definitely does not represent the offset within the ELF segment shown in the "Segment" column (even though it shows .text which is a section), but rather the absolute virtual address. It may be because as Ben Voigt suggests below the word "segment" in the documentation for the "Segment" column refers to binary (e.g. ELF) segments/sections, but for the "Start" column it may refer to architecture defined segments when the CPU uses a segmented memory model (e.g. segmented 16-bit x86 real mode). Confusing.