assembly tree x86-64 reverse-engineering stack-memory

Using nodes in assembly x86

I'm trying to understand what does this code do:

.section .text
.section .data
A:   .long 3
     .quad B
     .quad C
     .quad D
     .quad 0
B:   .long 10
     .quad E
     .quad F
     .quad 0
C:   .long 22
     .quad 0
D:   .long 17
     .quad G
     .quad 0
E:   .long 85
     .quad 0
F:   .long 2
     .quad 0
G:   .long 8
     .quad 0
.global _start
_start:
    mov $17, %esi
    mov $A, %rdi
    call func
    movq $1, %rdi
    sub %rax, %rdi
    movq $60, %rax
    syscall

func:
    push %rbp
    movq %rsp, %rbp
    cmpl %esi, (%rdi)
    jne next
    mov $1, %rax
    jmp end

next:
    mov $0, %r10

test:
    cmpq $0, 4(%rdi, %r10, 8)
    je fail
    push %rdi
    push %r10
    mov 4(%rdi, %r10, 8), %rdi
    call func
    pop %r10
    pop %rdi
    cmpq $0, %rax
    jne finish
    inc %r10
    jmp test

finish:
mov $1, %rax
jmp end

fail:
     mov $0, %rax
end:
     leave
     ret

And as I understand it tries to determine if "17" appears in the data section. Given that the value of %rsp is some hexadecimal X at the beginning, I want to know what is the range of values for %rsp is (using X), but it looks like it depends on the test section. Do we go from A to B, then E, F, back to C, then finally D? What will be the value at the end? I'm trying to "translate" it to C and so far I have:

int func(Node* root, int X){
    if(root->data == null)
        return 0;
    Node *son == root->sons;
    while(son != null){
        if(son->data == 17)
            return 1;
        son += 1;
    }
    return 0;
}

Does it look correct?

Solution

Keep working on the C version, since it is really helpful to know what you want the program to do before taking an algorithm to assembly code.

As Jester points out, your C-language version will never search E, F, or G, because it only goes one level deep past A, which enumerates B, C, D -- so those will get searched but E, F, G won't.

In your attempt, you have written a line:

    if(root->data == null)
        return 0;

This is a type mismatch as data is an int, the int that you're searching for, and as an int it cannot take on the value null, so probably should just skip this condition test.

You have declared Node *son, and are incrementing as in son++. This is also a type mismatch because root->sons is an inline array, and its elements are node pointers, so the array is effectively a pointer to pointer to sons.

With your code, that son++ will increment by sizeof(Node), which is simply wrong as these are pointers densely packed rather than being Nodes.

You want that to be a Node **sons = root->sons; — this way the sons++ will increment by sizeof(Node*) instead, which is what you want.

Also because you've used Node*son = root->sons, there is a missing dereference to fetch the elements that are those child pointers, which you can see addressed when using Node **sons = root->sons; — we can do Node *son = *sons; to accomplish that dereference.

A recursive algorithm is helpful and appropriate here given that the tree being searched is a recursive data structure, so also quite natural.

#define null 0

typedef struct Node {
    int data;
    struct Node* children[1];
} Node;

int func(Node* root, int X){
    if (root->data == X) return 1;

    // this variable holds state to continue search
    //  when the recursive call returns not found
    Node **children = root->children; 

    while (1) {
        Node *child = *children;
        if (child == null) break;

        // check not just this child, but the whole branch starting from this child, using recursion
        int answer = func(child, X);

        if (answer) return 1; // found in that branch, so we're done

        children++;           // not found in that branch, so try next child from here
    }
    return 0; // not found in this branch
}

The above algorithm checks "root" to see if it is a match, and if not, goes on to run the function recursively on each child, which do this same check and cover all the children including children's children.

However, an iterative solution can work efficiently, though will need to employ a temporary data structure like stack or list for the duration of the algorithm -- this to record where to "backtrack" when a given branch doesn't find a match.

int func(Node* root, int X) {

    // this temporary variable should be heap allocated and expandable
    // but that is beyond the scope of this answer.  
    // (Using C++ would allow generic list, but also beyond scope)

    Node *searchList[100];  // used as stack

    searchList[0] = root; // push root to prime this stack
    int ix = 1;

    while (ix != 0) {  // while stack not empty

        ix--;
        Node *node = searchList[ix];  // pop

        if (node->data == X) return 1;

        Node **children = node->children;
        while (1) {
            Node *child = *children;
            if (child == null) break;

            searchList[ix] = child; // push child
            ix++;

            children++;
        }
    }
    // exhausted the search list
    return 0;    
}

This version uses a temporary array as a stack to record nodes that need to be checked. Then it traverse the stack until empty, looking at nodes coming off the stack to see if they match, while also placing children on the stack. Thus, it will also search children's children.

(Some programmers might do the ->data == X check before pushing the node onto the list, though that would require the check in both places that push.)

(In another variant, we could push Node **'s instead of Node*'s.)

(It is more complicated in C due to C's poor handling of generic data types like Lists, and the lack of automatic expansion we would expect from a language like C++ or C#).