gets() function using getchar from assembly

I'm having some problems making a gets() function on a C code i'm doing for one of my classes. So i already have a getchar() function but on assembly and i'm calling it from C using extern The thing is that at the moment i'm running the code i enter a string and it doesn't show the complete string instead some characters.

This is the code i have atm: C code:

extern char getchar(void);
extern void putchar(char data);
void gets(char *str);
void puts(char *str);
void new_line();

char string[20];

int main(){
    while(1){
        gets(string);
        new_line();
        puts(string);
    }
    return 0;
}

void new_line(){
    putchar(0xD);
    putchar(0xA);
}
void gets(char *str){
    unsigned char i = 0;
    while((*str = getchar()) != 0xD){
        str[i] = getchar();
        i++;
    }
}

void puts(char *str){
    while(*str){
        putchar(*str++);
    }
}

and my ASM code just in case:

.MODEL tiny

.CODE
    public _putchar
    public _getchar

    _putchar    PROC
                push bp
                mov bp, sp
                mov dl, [bp + 4]
                mov ah, 2
                int 21h
                pop bp
                ret
    _putchar    ENDP

    _getchar    PROC
                push bp
                mov bp, sp
                mov ah, 1
                int 21h
                mov [bp + 4], al
                pop bp
                ret
    _getchar    ENDP
END

I'm running the code on an Arduino Mega using MTTTY with an 8086 interpreter that our teacher provided.

Any way i can solve this issue with the gets() function so i can show the input string properly?

For example if i enter "hello world" it only prints "l ol"

Solution

Your C gets implementation is broken, regardless of the asm getchar implementation. You could debug it on a normal C implementation with a normal debugger on your desktop.

You call getchar() twice, and only save every 2nd result.

The first result is assigned to str[0] and checked for '\r'.

// your version with comments
void gets_original_buggy (char *str){
    unsigned char i = 0;   // this is an index; it should be an `int` or `size_t`

    while((*str = getchar()) != 0xD){  // overwrite the first byte of the string with an input
        str[i] = getchar();    // get ANOTHER new input and save it to the end.
        i++;
    }
    // str[i] = 0;  // missing zero terminator.
}

Here's how I'd write it:

#include <stddef.h>
//#include <stdio.h>

extern unsigned char getchar(void);

// returns length.
// negative means EOF.  TODO: implement an EOF check if your getchar() supports it.
// FIXME: take a max-length arg to make it possible to prevent buffer overflows.
ptrdiff_t gets(char *str) {
    char *start = str;  // optional

    char tmp;  // read chars into a local, and check before assigning anything to *str
    while( (tmp = getchar()) != '\r') {
        // TODO: also check for EOF
        *str++ = tmp;            // classic pointer post-increment idiom
    }
    *str = 0;     // terminate the C string.

    return str - start;  // optional, return the length
}

It's always useful to return the string length instead of throwing it away in a function that knows it, and this only costs the compiler a couple extra instructions. The pointer increment simplifies the addressing mode, saving code-size.

(compiles nicely with gcc and clang for 32-bit x86 on Godbolt, should be pretty similar for x86-16.)

You might also/instead check for '\n' depending on your getchar implementation, and whether it normalizes line endings or not. And remember that stopping after reading a \r will leave a \n unread if you have DOS "\r\n" line endings.

In ISO C, getchar() should give you only '\n' line endings for files opened in text mode, but you've made getchar just a wrapper around DOS int 21h / AH=1 (READ CHARACTER FROM STANDARD INPUT, WITH ECHO) function. So that's what sets the behaviour of your implementation.

asm bug:

# in _getchar:
    mov [bp + 4], al         ; clobber memory you don't own.

That will clobber memory above the return address. char getchar(void) doesn't take any args, so your function doesn't "own" that memory. Your compiler should expect a return value in AL. (And if you thought that was returning by reference, no, you're just overwriting the pointer arg. Except the caller isn't even passing one.)

If you want your getchar to be able to return EOF distinct from a 0xFF byte, declare it as returning int, and zero AH after making the system call. (So you can return a 16-bit -1 in AX, or a zero-extended unsigned char in AX (i.e. value in AL).

BTW, there's a reason gets() is deprecated, and actually removed in ISO C11: it's impossible to prevent buffer overflows when reading unknown-length input.

Your function should take a size limit as a 2nd arg.

Programming an Arduino's AVR or ARM CPU directly would probably be easier to learn, and more useful, than using DOS system calls on an emulated 8086. If you're going to do that, there's no point in doing it on real hardware vs. a simulator.

Learning x86 as your first assembly language is ok if you don't mess around with segmentation, and you don't try to write a bootloader (there's lots of arcane legacy stuff with the A20 gate, and switching from real mode to protected mode). DOS system calls are totally obsolete, except for maintaining legacy codebases. Learning the details of how the different AH=?? / int 21h system calls work exactly is about as useful as COBOL. The BIOS int 10h and other families are slightly more useful if you're making a legacy boot sector (instead of EFI), but you don't need to do that to learn asm. If you learn asm in user-space under Linux, Windows, Mac, *BSD, or whatever, it becomes easy to understand / learn other ways of communicating with the outside world later, if you ever need to, and learn how kernels work.

Linux system calls have a similar ABI (eax=call number / int 0x80, sysenter, or syscall), but Linux system calls are more or less POSIX system calls that it is useful to know about for real-world low-level programming.

The complexities of POSIX TTY line-buffered input with sys_read are different from the complexities of DOS character-reading functions and line-ending nonsense, but arguably more useful to learn.