Search code examples

How to go From Assembler instruction to C code

I have an assignment where, among other things, I need to look in an .asm file to find a certain instruction and "reverse engineer" (find out) what part of the C code causes it to be executed on an assembler level. (Example below the text)

What would be the fastest (easiest) way to do this. Or better to say, what other commands / instructions / labels that are around it in the .asm file should/could I pay attention to, that would guide me to the right C code?

I have next to zero experience with assembler code and it is tough to figure out what exact lines of C code cause a particular instruction to happen.

The architecture, if that makes any difference, is TriCore.

Example: I managed to figure out what C code causes an insert in the asm file, by following where the variables are used

    movh.a  a15,#@his(InsertStruct)
    ld.bu   d15,[a15]@los(InsertStruct)
    or  d15,#1
    st.b    [a15]@los(InsertStruct),d15
    ld.bu   d15,[a15]@los(InsertStruct)
    insert  d15,d15,#0,#0,#1
    st.b    [a15]@los(InsertStruct),d15
    mov d15,#-1

that led me to the following C code:

InsertStruct.SomeMember = 0x1u;

InsertStruct.SomeMember = 0x0u;


  • The architecture is TriCore (if that makes any difference).

    Of course. Assembler code is always architecture specific.

    ... what part of the C code causes it to be executed on an assembler level.

    When using a highly optimizing compiler you nearly have no chance:

    The Tasking compiler for TriCore for example sometimes even generates one fragment of assembly code (stored only once in memory!) for two different lines of C code in two different C files!

    However the code in your example is not optimized (unless the structure you named InsertStruct is volatile).

    In this case you could compile your code with debugging information switched on and extract the debugging information: From an ELF format file you can use tools like addr2line (freeware from the GNU compiler suite) to check which line of C code corresponds to an instruction at a certain address.

    (Note: The addr2line tool is architecture independent as long as both architectures have same width (32-bit), the same endianness and both use the ELF file format; you could use addr2line for ARM to get the information from a TriCore file.)

    If you really have to understand a fragment of assembler code I myself typically do the following:

    I start a text editor and paste in the assembler code:

    movh.a  a15,#@his(InsertStruct)
    ld.bu   d15,[a15]@los(InsertStruct)
    or      d15,#1
    st.b    [a15]@los(InsertStruct),d15

    Then I replace each instruction by the pseudo-code equivalent:

    a15 =  ((((unsigned)&InsertStruct)>>16)<<16;
    d15 =  *(unsigned char *)(a15 + (((unsigned)&InsertStruct)&0xFFFF));
    d15 |= 1;
    *(unsigned char *)(a15 + (((unsigned)&InsertStruct)&0xFFFF)) = d15;

    In the next step I try to simplify this code:

    a15 =  ((unsigned)&InsertStruct) & 0xFFFF0000;


    d15 = *(unsigned char *)((((unsigned)&InsertStruct) & 0xFFFF0000) + (((unsigned)&InsertStruct)&0xFFFF));


    d15 = *(unsigned char *)((unsigned)&InsertStruct);


    d15 = *(unsigned char *)&InsertStruct;

    In the end I try to replace jump instructions:

    d15 = 0;
    if(d14 == d13) goto L123;
    d15 = 1;

    ... becomes:

    d15 = 0;
    if(d14 != d13) d15 = 1;

    ... and finally (maybe):

    d15 = (d14 != d13);

    In the end you have C code in the text editor.

    Unfortunately this takes much time - but I don't know any faster method.