Search code examples
assemblyparametersarmparameter-passingcpu-registers

Why if I use immediately r0 the program doesn't work but if I LDR to r2 and then LDR to r0 it works?


I have this program that just returns the value that I am passing via command line.

This works:

.global main

main:
        ldr     r2, [r1,#4]    // get the argv[1] and put it in r2
        ldr     r0, [r2]       // put it in r0 from r2
        sub     r0, r0, #48    // from ascii value to actual decimal value
        bx      lr

What I have not clear is, why doesn't it work if I use r0 instead of r2? Like this doesn't work:

.global main

main:
        ldr     r0, [r1,#4]        // put the value immediately to r0
        sub     r0, r0, #48        // ascii to actual value
        bx      lr

If I execute the program with 7 value:

./program 7
echo $?

in the first case I got the actual value (7) but in the second I got (3)...


Solution

  • You are trying to do return(argv[1][0]-0x30) which is a bug converting a string in general but works for one character, but instead you are:

        ldr     r2, [r1,#4]    // address of argv[1]
        ldr     r0, [r2]       // read first four characters in argv[1]
                               // argv[1][0..3]
        sub     r0, r0, #48    // convert the first one to decimal
                               // leaving the other three unmodified
        bx      lr
    

    This one is return( (*((unsigned int *)(&argv[1][0]))) - 0x30) which is a bug (as mentioned more than once in the prior question)(assuming I got all of my syntax right banging out this answer)(typecast the char pointer to the first character to to a word pointer of the first four characters and read that), but

        ldr     r2, [r1,#4]    // address of argv[1]
        sub     r0, r0, #48    // modify address to argv[1]
        bx      lr
    

    is return( ((unsigned int)(argv[1])) - 0x30), an even bigger bug (convert the pointer to the string to a word and the subtract from that address)(assuming I banged out the right syntax here as well).

    You are modifying the address not any of the string data in this second case.

    You need to cover both levels of indirection not just one. And a string is an array of bytes not an array of words.

    try

    ./program 77
    

    Instead of 77 you will get 14087 or some number like that, with your supposedly working version.

    All of this was covered in the prior question. Do you understand what a two dimensional array means? char argv[][]?

    ./program 77
    

    argv itself points to an array of pointers

    argv[0]
    argv[1]
    argv[2]
    

    and then each of those points to a string

    argv[0][0]='.'
    argv[0][1]='/'
    argv[0][2]='p'
    argv[0][3]='r'
    argv[0][4]='o'
    ...
    argv[0][n]=0
    
    argv[1][0]='7'
    argv[1][1]='7'
    argv[1][2]=0
    
    r0 is argc
    r1 is argv
    

    so r1 contains the address to the ARRAY OF POINTERS

    ldr r3,[r1,#0] //pointer to argv[0] string
    ldr r4,[r1,#4] //pointer to argv[1] string
    ldr r5,[r1,#8] //pointer to argv[2] string
    ...
    

    You cannot skip that step you want to access the string you have to start at the beginning of the string.

    Now once you have done the above THEN you can do this:

    ldrb r0,[r4,#0] // argv[1][0] = '7'
    ldrb r1,[r4,#1] // argv[1][1] = '7'
    ldrb r2,[r4,#2] // argv[1][2] = 0
    

    if you instead

    ldr r0,[r4,#0] 
    

    that is all of argv[1][0] through argv[1][3] in one shot assuming you don't get an alignment fault because there is no reason why argv[1] has to point to a word aligned address.

    so that would put 0xZZ003737 in r0, where ZZ is an unknown/non-deterministic byte that is outside the argv[1] string it could be argv[2][0] for example. You have experienced some dumb luck if you are doing

    ./program 7
    

    and getting 0x00000037 by using the wrong instruction and wrong approach (for the nth time read and understand Frant's answer to the other question).

    If you were to have this

    char mystring[]="1234567";
    

    would you use

    mystring[0]-=0x30;
    

    To convert that from a string (0x31,0x32,0x33,...0x37,0x00) to a value 1234567 (0x12d687)? Certainly not, that would not work at all. You would need to use atoi, atol, strtol, etc. (read Frant's answer) or roll your own.

    rb=0;
    for(ra=0;mystring[ra];ra++)
    {
        rb*=10;
        rb+=mystring[ra]-=0x30;
    }
    

    assuming we know ahead of time the user is passing in a decimal number in the string. (bad assumption, yet another bug doing something like this)

    doing this:

    mystring[0]-=0x30;
    

    only modifying one item does nothing to convert the string to a number.

    to demonstrate all of this further, the operating system loader will fill in argv[][] for you in some memory you have access to.

    So for example

    ./so 123
    

    I am going to make up addresses for demonstration purposes

    [address] data
    [0x00001000] 0x00001008  pointer to argv[0]
    [0x00001004] 0x0000100D  pointer to argv[1]
    [0x00001008] 0x2E '.'
    [0x00001009] 0x2F '/'
    [0x0000100A] 0x73 's'
    [0x0000100B] 0x6F 'o'
    [0x0000100C] 0x00 string termination
    [0x0000100D] 0x31 '1'
    [0x0000100E] 0x32 '2'
    [0x0000100F] 0x33 '3' 
    [0x00001010] 0x00 string termination
    

    So in this case r1 would be set to 0x00001000 before main is called.

    So

    ldr r2,[r1,#4] read 0x1004 r2 = 0x100D
    ldrb r0,[r2]  read 0x100D r0 = 0x31
    sub r0,r0,#0x30, r0 = 1 (note: which is not equal to 123)
    

    If you

    ldr r2,[r1,#4] read 0x1004 r2 = 0x100D
    ldr r0,[r2]  read 0x100D r0 = 0x00333231
    sub r0,r0,#0x30  r0 = 0x00333201 (note: which is not equal to 123 = 0x7B)
    

    Plus that is an alignment fault if enabled.

    If you

    ldr r2,[r1,#4] read 0x1004 r2 = 0x100D
    sub r0,r2,#0x30   r0 = 0xFDD
    

    And that is clearly wrong, that has no value whatsoever. Hosing a pointer to a string using a bad string conversion solution.

    Note:

    ldr     r0, [r2]  // read word from address in r2 and put in r0
    

    is not equal to

    mov     r0, r2    // copy contents of r2 into r0
    

    For at least the arm tools and gas assembly languages the [brackets] indicate a level of indirection so [r2] means the thing at the address contained in r2, where r2 means the contents of r2.

    Two completely different instructions. You should have the arm documentation for the instruction set, the architectural reference manual for one of the architectures, start with armv5 if you don't know. Don't bother with ARM's Programmers' Reference Manuals; they create more questions than answers. The technical reference manual and architectural reference manual for the core in question is what you should always have BEFORE you start doing any work like this.

    ARM does pretty good with their pseudo code, especially the older ARM ARM compared to the newer which has more features so more detail to cover.

    Since some of us saw your prior/original question with the original content before modification and you are already calling C functions from main: then read Frant's answer with what you know now and just call another C function.