Search code examples
assemblynasmbinary-search

Seg faults in Binary search in NASM assembly


Hi I'm struggling to find out why my binary search implementation is seg faulting (I'm new to NASM assembly)

Sorry I know its not much of a MVP but I cant think of an appropriate way to make one in assembly.

%define n [ebp+8]
%define list [ebp+12]
%define low [ebp+16]
%define high [ebp+20] ; Parameters loading from the stack
binary_search:
  push ebp
  mov ebp, esp

  mov ebx, n
  mov edi, list
  mov ecx, low
  mov edx, high

  cmp ecx, edx ; if (low > high)
  jg .FIRST

  mov eax, edx ; Next few lines for mid = low + (high - low)/2
  sub eax, ecx
  sar eax, 1 ; Will this give an appropriate index? (i.e is it floor division?)
  add eax, ecx
  lea esi, [edi+eax*4] ;Getting list[mid]
  cmp ebx, [esi]; if (n == list[mid])
  je .END
  jl .SECOND
  jg .THIRD

  jmp .END

.FIRST:
  mov eax, -1 ; return -1
  jmp .END

.SECOND:
  mov edx, eax ; return middle - 1
  dec edx
  jmp .CONTINUE

.THIRD:
  mov ecx, eax ; low = mid - 1
  dec ecx
  jmp .CONTINUE

.CONTINUE:
  push edx
  push ecx
  push edi
  push esi
  push ebx
  call binary_search ; recursive call, causing the segfault.
  pop ebx
  pop esi
  pop edi
  pop ecx
  pop edx
  jmp .END

.END:
  mov esp, ebp
  pop ebp
  ret

After commenting out different sections, I have determined that it is definitely something to do with my recursive call to binary_search that is causing the seg fault. (Found Inside .CONTINUE) What am I messing up insdie of the binary_search body that doesnt agree with multiple recursive calls?

The binary search algorithm:

binary_search(n, list, low, high)

    if (low > high)
        return -1

    mid = low + (high - low) / 2

    if (n == list[mid])
       return middle;
    if (n < list[mid])
       high = mid - 1
    else
        low = mid + 1

    return binary_search(n, list, low, high)



I know its a long shot, thanks :)

Edit: its 32-bit mode


Solution

  • This part defines your function's protocol:

    %define n [ebp+8]
    %define list [ebp+12]
    %define low [ebp+16]
    %define high [ebp+20] ; Parameters loading from the stack
    binary_search:
      push ebp
      mov ebp, esp
    

    The dword [ebp] will hold the old ebp value, dword [ebp+4] the old eip to return to, and then your parameters follow. These are n, list, low, and high, in this order. As the stack grows downwards you need to push to the stack in the order so that the highest addressed parameter (which happens to be "high") is pushed first, then the second highest (low), and so on (list, n, eip, ebp).

    The next part determines which registers you use to hold the variables initialised from these stack frame parameters:

      mov ebx, n
      mov edi, list
      mov ecx, low
      mov edx, high
    

    (Either ecx or edx is modified before the recursive call. But they still act as the "low" and "high" variables in that code.)

    Now to check your recursive call site. Essentially you want to pass all of the registers you're using to the function again. ebx = n, edi = list, ecx = low, edx = high. This is what you do:

    .CONTINUE:
      push edx
      push ecx
      push edi
      push esi
      push ebx
      call binary_search ; recursive call, causing the segfault.
      pop ebx
      pop esi
      pop edi
      pop ecx
      pop edx
    

    If you match up the pushed variables with the stack frame parameters expected by your function's protocol, you get:

      push edx    ; (unused)
      push ecx    ; high
      push edi    ; low
      push esi    ; list
      push ebx    ; n
      call binary_search
    

    The variable represented by esi is only used internally by your function. It does not need to be passed to subsequent calls. What's more, it messes with your function's protocol. edx, ecx, and edi get shoved one stack slot higher each than they should be to pass these variables along to your recursive call. The solution is to drop push esi and pop esi here.


    There are a few more potential problems with your code:

    • As I mentioned in the comments you should use inc ecx not dec ecx.

    • What is your program's calling convention used to call this function? You seem to be using a lot of registers and are only preserving ebp and esp.

    • If your calling convention allows to alter the stack frame parameter contents without limits, you could replace the literal call that you use to recurse, by a "tail call" with the parameters changed to those you currently pass to the call. And re-using the entire stack frame. It would look very similar to a loop at that point though. Perhaps you do want an actual (literally) recursive implementation. The tail-call-optimised or iterative approaches need O(1) of stack space though, as opposed to your O(log2 n).