Inputting multi-radix multi-digit signed numbers with DOS

My 8086 assembly program draws a nice diamond of white smiling faces that have a particular attribute specifying the foreground and background colors. I have hardcoded the ASCII character code and the color attribute but I would like the user to input these values from the keyboard. I searched the DOS api but could not find a single function that allows me to input a number. What can I do?
It would be nice to be able to input the attribute as an hexdecimal number!
Neither how to accept input with a large (multi-digit) number nor How to input from a user multi digit number in assembly? seem to address the issue in full.

    ORG  256

    mov  ax, 0003h       ; BIOS.SetVideoMode 80x25 text
    int  10h
    mov  dx, 0127h       ; DH = 1 (Row), DL = 39 (Colunn)
    mov  cx, 2           ; Replication count
    mov  bh, 0           ; Display page
    mov  bl, [Attribute]

a:  mov  ah, 02h         ; BIOS.SetCursorPosition
    int  10h
    mov  al, [Character]
    mov  ah, 09h         ; BIOS.WriteCharacterAndAttribute
    int  10h
    add  dx, 00FEh       ; SIMD, DH += 1 (Row), DL -= 2 (Column)
    add  cx, 4           ; Row of brick gets longer
    cmp  cx, 2+(4*11)
    jb   a

b:  mov  ah, 02h         ; BIOS.SetCursorPosition
    int  10h
    mov  al, [Character]
    mov  ah, 09h         ; BIOS.WriteCharacterAndAttribute
    int  10h
    add  dx, 0102h       ; SIMD, DH += 1 (Row), DL += 2 (Column)
    sub  cx, 4           ; Row of brick gets shorter
    jnb  b

    mov  ax, 4C00h       ; DOS.TerminateWithExitcode
    int  21h

Character db 1           ; WhiteSmilingFace
Attribute db 2Fh         ; BrightWhiteOnGreen

Solution

DOS has several input functions but all deal with characters exclusively.

If the number involved is small, like say 1 or 2 digits, many (new) programmers use the DOS.GetCharacter function 01h resulting in code like this:

    ; 1-digit number
    mov  ah, 01h        ; DOS.GetCharacter
    int  21h            ; -> AL=["0","9"]
    sub  al, "0"        ; -> AL=[0,9]

    ; 2-digit number
    mov  ah, 01h        ; DOS.GetCharacter
    int  21h            ; -> AL=["0","9"] (tens)
    mov  bl, al
    mov  ah, 01h        ; DOS.GetCharacter
    int  21h            ; -> AL=["0","9"] (ones)
    mov  ah, bl
    sub  ax, "00"       ; SIMD -> AH=[0,9] (tens), AL=[0,9] (ones)
    aad                 ; AL = AH * 10 + AL -> AL=[0,99]

This is the most basic way of inputting small numbers, but it lacks in many ways. As an example, consider what would happen to your program if the user made a mistake and accidently pressed a key for which DOS returns an extended ASCII character (a zero followed by a scancode).

Then think about the mess you would get if the above method were used to input numbers that have 3, 4, or 5 digits! Inputting a multi-digit number is best done using the DOS.BufferedInput function 0Ah. This function already gives your program a better chance at surviving since it allows keyboard users to correct their mistakes. To allow for an input of at most 5 decimal digits, the buffer that you submit to DOS could be defined with buf db 6, 0, 6 dup 0. How buffered input works has the details. Once the string of characters that represent the number has been entered, the text must get converted into a numeric value. Next code shows this:

snippet 1a

    mov  dx, buf
    mov  ah, 0Ah        ; DOS.BufferedInput
    int  21h
    xor  ax, ax         ; Result = 0
    mov  si, buf+1
    xor  cx, cx
    mov  cl, [si]       ; -> CX is number of characters entered
    jcxz .z             ; Return zero for an 'empty' input
    ; Decimal
.a: inc  si             ; Next character
    mov  dx, 10
    mul  dx             ; Result = Result * 10
    mov  dl, [si]       ; -> DX = ["0","9"] (NewDigit)
    sub  dl, 48         ; Convert NewDigit from ["0","9"] to [0,9]
    add  ax, dx         ; Result = Result + NewDigit
    loop .a
.z:

Sometimes you will want to input numbers in the hexadecimal, octal, or binary formats, in which case you could use next calculation loops:

snippet 2a

    ; Hexadecimal
.a: inc  si             ; Next character
    shl  ax, 1          ; Result = Result * 16
    shl  ax, 1
    shl  ax, 1
    shl  ax, 1
    mov  dl, [si]       ; -> DL = {["0","9"],["A","F"]} (NewDigit)
    cmp  dl, "9"
    jbe  .b
    sub  dl, 7
.b: sub  dl, 48
    or   al, dl         ; Result = Result + NewDigit
    loop .a

    ; Octal
.a: inc  si             ; Next character
    shl  ax, 1          ; Result = Result * 8
    shl  ax, 1
    shl  ax, 1
    mov  dl, [si]       ; -> DL = ["0","7"] (NewDigit)
    sub  dl, 48
    or   al, dl         ; Result = Result + NewDigit
    loop .a

    ; Binary
.a: inc  si             ; Next character
    cmp  byte [si], "1" ; -> CF=1 for "0", CF=0 for "1"
    cmc                 ; -> CF=0 for "0", CF=1 for "1"
    rcl  ax, 1          ; Result = Result * 2 + NewDigit
    loop .a

Even with the editing facilities that the DOS.BufferedInput function 0Ah offers it is not ok to just trust the user at the keyboard to supply your program the correct data. It is you that has to validate the input, and if you find that something is amiss, there're a number of ways to deal with it. You could exit the program with (or without) an error message, you could have the user redo the input, you could choose to deliver some special value like the '8000h integer indefinite' that the FPU uses, or you could return a saturated result. The important thing is that you deal with the situation.

Building a better number input routine

To improve on the code that we have so far, we could

write the code such that the user can freely choose the number base that they want to use. All it will take is allowing the input to contain an additional numeric affix. I have always preferred the one character suffixes that Intel uses, so 'h' for hexadecimal, 'o' for octal, 'b' for binary, and 'd' or none for decimal.
add a further suffix in order to shorten long numbers that are multiples of 1000 ('K' for Kilo) or 1024 ('KB' for KiloByte). eg. 60K is 60000 and 6KB is 6144
allow the user to use the so-called 'thousands separator', and make long runs of digits become easier to read/write. A nice thing about it, is that it need not separate at the thousands at all! We can apply this to any of the number bases. FASM uses the apostrophe ' for this.
allow the user to use any case for the suffixes and the hexadecimal digits A through F, making the text case-insensitive.
allow the user to have leading whitespace in their input. Sounds silly? Well not so much if you have your inputs stored in an history of some kind, and later recall that list. You would appreciate the nice right alignment that you could get.
allow the user to have trailing whitespace in their input. Ask yourself whether you'd hate the program to disapprove of an input like 123 or even 20 years.
allow the user to prefix the number with a minus sign -, so that they can start working with negative numbers in their code.
extend the range of numbers that we can process. Instead of storing the result in the 16-bit AX register, we will store it in the 32-bit EAX register. If the code is to run on the 8086 cpu, then we would store in the 32-bit DX:AX register pair!

but we must

verify that the input is composed of valid characters so as to not spend effort processing garbage
detect numeric overflow so as to not deliver bogus results to the program

Applying validation and overflow detection turns snippet 1a into

snippet 1b

    mov  dx, buf
    mov  ah, 0Ah        ; DOS.BufferedInput
    int  21h
    xor  ax, ax         ; Result = 0
    mov  si, buf+1
    xor  cx, cx
    mov  cl, [si]       ; -> CX is number of characters entered
    jcxz .z             ; Return zero for an 'empty' input
    ; Decimal
.a: inc  si             ; Next character
    xor  bx, bx
    mov  bl, [si]       ; -> BX = ["0","9"] (NewDigit) ?
    sub  bl, 48         ; Convert NewDigit from ["0","9"] to [0,9]
    cmp  bl, 9
    ja   .z             ; Stop if not a digit
    mov  dx, 10
    mul  dx             ; Result = Result * 10
    jc   .o
    add  ax, bx         ; Result = Result + NewDigit
    jc   .o
    loop .a
    jmp  .z
.o: mov  ax, 65535      ; Saturated result is MAXUINT
.z:

For the hexadecimal, octal, or binary formats, substitute next loops:

snippet 2b

    ; Hexadecimal
.a: inc  si             ; Next character
    mov  dl, [si]       ; -> DL = {["0","9"],["A","F"]} (NewDigit) ?
    cmp  dl, "9"
    jbe  .b
    sub  dl, 7
.b: sub  dl, 48
    cmp  dl, 15
    ja   .z             ; Stop if not a digit
    rol  ax, 1          ; Result = Result * 16
    rol  ax, 1
    rol  ax, 1
    rol  ax, 1
    test al, 15
    jnz  .o
    or   al, dl         ; Result = Result + NewDigit
    loop .a

    ; Octal
.a: inc  si             ; Next character
    mov  dl, [si]       ; -> DL = ["0","7"] (NewDigit) ?
    sub  dl, 48
    cmp  dl, 7
    ja   .z             ; Stop if not a digit
    rol  ax, 1          ; Result = Result * 8
    rol  ax, 1
    rol  ax, 1
    test al, 7
    jnz  .o
    or   al, dl         ; Result = Result + NewDigit
    loop .a

    ; Binary
.a: inc  si             ; Next character
    mov  dl, [si]       ; -> DL = ["0","1"] (NewDigit) ?
    sub  dl, 48
    cmp  dl, 1
    ja   .z             ; Stop if not a digit
    shl  ax, 1          ; Result = Result * 2
    jc   .o
    or   al, dl         ; Result = Result + NewDigit
    loop .a

The de luxe version of inputting a number applies everything that was mentioned above. It is important to note that next program will not run on a 8086 cpu because it uses 32-bit registers and instructions introduced with later processors. (For sure a nice exercise to rewrite for 8086!) The program runs in a DOS window, and of course also in the true real address mode of an x86 cpu.
The InputEAX routine sets the carry flag if the input turns out to be syntactically wrong (EAX=0), or the input leads to a value that exceeds the 32-bit range [-(4GB-1),+(4GB-1)] (EAX=80000000h).
This inputting code does not pretend to be gospel! If you don't need a certain feature, then just remove it. And if for your particular use case some feature is missing, then just add it. Leave a comment if this happens...

        ORG     256

again:  mov     dx, msg1
        mov     ah, 09h                 ; DOS.PrintString
        int     21h
        call    InputEAX                ; -> EAX CF
        ; ignoring the CF for the purpose of the demo
        push    ax                      ; (1)
        mov     dx, msg2
        mov     ah, 09h                 ; DOS.PrintString
        int     21h
        pop     ax                      ; (1)
        call    PrintEAX
        cmp     eax, 27                 ; Arbitrarily chosen, but 27 == ESC
        jne     again

exit:   mov     ax, 4C00h               ; DOS.TerminateWithReturnCode
        int     21h
; --------------------------------------
msg1    db      13, 10, 'Input a number : $'
msg2    db      10, 'The number is $'
; --------------------------------------
; IN (eax) OUT ()
PrintEAX:
        pushad
        test    eax, eax
        jns     .a
        push    ax                      ; (1)
        mov     dl, "-"
        mov     ah, 02h                 ; DOS.PrintCharacter
        int     21h
        pop     ax                      ; (1)
        neg     eax
.a:     mov     ebx, 10
        push    bx                      ; (2a) Sentinel
.b:     xor     edx, edx
        div     ebx
        push    dx                      ; (2b) Remainder
        test    eax, eax
        jnz     .b
        pop     dx                      ; (2)
.c:     add     dl, "0"
        mov     ah, 02h                 ; DOS.PrintCharacter
        int     21h
        pop     dx
        cmp     dx, bx
        jb      .c
        popad
        ret
; --------------------------------------
; IN () OUT (eax,CF)
InputEAX:
        xor     eax, eax                ; In case of CF=1 on exit
        pushad
        sub     sp, 44+44               ; 2 local buffers
        mov     bp, sp
        push    44                      ; Buffer header 44, 0
        mov     dx, sp
        mov     ah, 0Ah                 ; DOS.BufferedInput
        int     21h
        mov     si, bp                  ; Where the string of characters begins

; Leading whitespace
.a:     lodsb
        call    IsWhitespace            ; -> ZF
        je      .a
        dec     si

; Unary
        mov     al, [si]
        push    ax                      ; Possible UNARY at [bp-4]
        cmp     al, "+"
        je      .b
        cmp     al, "-"
        jne     .c
.b:     inc     si

; Digits followed by base-suffix, in turn for Hex, Oct, Bin, and Dec
.c:     mov     cx, 16+256*'H'
        call    GetDigits               ; -> SI DI CF (AX)
        jnc     .d
        mov     cx, 8+256*'O'
        call    GetDigits               ; -> SI DI CF (AX)
        jnc     .d
        mov     cx, 2+256*'B'
        call    GetDigits               ; -> SI DI CF (AX)
        jnc     .d
        mov     cx, 10+256*'D'
        call    GetDigits               ; -> SI DI CF (AX)
        jc      .NOK
.d:     call    LodsUCasedChar          ; -> AL SI

; [option] K, M, G, KB, MB, GB order-suffixes
        mov     ebx, 1                  ; Multiplier
        mov     ch, 3                   ; ORDER
        cmp     al, "G"                 ; Giga
        je      .e
        mov     ch, 2                   ; ORDER
        cmp     al, "M"                 ; Mega
        je      .e
        mov     ch, 1                   ; ORDER
        cmp     al, "K"                 ; Kilo
        jne     .f
.e:     mov     bx, 1000                ; Multiplier
        call    LodsUCasedChar          ; -> AL SI
        cmp     al, "B"
        jne     .f
        mov     bx, 1024                ; Multiplier
        lodsb

; Trailing whitespace or end-of-input
.f:     call    IsWhitespace            ; -> ZF
        je      .OK
        cmp     al, 13                  ; Terminating carriage return
        je      .OK

; Failed to extract any series of digits, or excess characters in string
.NOK:   stc
        jmp     .END

; Building the integer in EAX
.OK:    mov     byte [bp+44+44+31], 80h ; pushad.EAX = 80000000h (Integer
        xor     si, si                  ;       indefinite in case of overflow)
        xor     eax, eax                ; Result
.g:     movzx   edx, cl                 ; CL is RADIX {16,8,2,10}
        mul     edx
        jc      .END
        movzx   edx, byte [bp+44+si]    ; NewDigit [0,15]
        add     eax, edx
        jc      .END
        inc     si
        cmp     si, di                  ; DI is NumberOfDigits
        jb      .g

; [option] Applying the multipliers repeatedly
.h:     mul     ebx                     ; EBX={1,1000,1024}
        jc      .END
        dec     ch                      ; CH is ORDER [1,3]
        jnz     .h

; Negating as required
        cmp     byte [bp-4], "-"        ; UNARY
        jne     .CLC
        neg     eax                     ; Valid range [-(4GB-1),+(4GB-1)]
.CLC:   clc

; Returning the result
        mov     [bp+44+44+28], eax      ; pushad.EAX
.END:   lea     sp, [bp+44+44]
        popad
        ret
; --------------------------------------
; IN (al) OUT (ZF)
IsWhitespace:
        cmp     al, " "
        je      .a
        cmp     al, 9                   ; Tab
.a:     ret
; --------------------------------------
; IN (si) OUT (al,si)
LodsUCasedChar:
        lodsb
        cmp     al, "a"
        jb      .a
        cmp     al, "z"
        ja      .a
        and     al, 1101'1111b          ; UCase
.a:     ret
; --------------------------------------
; IN (cx,si) OUT (si,di,CF) MOD (ax)
GetDigits:
        push    si                      ; (1)
        xor     di, di                  ; NumberOfDigits
.a:     call    LodsUCasedChar          ; -> AL SI
        cmp     al, "'"                 ; 'Thousands' separator (apostrophe)
        je      .a
        mov     ah, al
        cmp     al, "0"
        jb      .c
        cmp     al, "9"
        jbe     .b
        cmp     al, "A"
        jb      .c
        cmp     al, "F"
        ja      .c
        sub     al, 7
.b:     sub     al, 48                  ; -> AL=[0,15]
        cmp     al, cl                  ; CL is RADIX {16,8,2,10}
        jnb     .c
        mov     [bp+44+di], al
        inc     di
        jmp     .a

.c:     test    di, di                  ; Any digits found ?
        jz      .NOK
        cmp     ah, ch                  ; CH is BASE-SUFFIX {HOBD}
        je      .OK
        cmp     ch, "D"                 ; Decimals need not be suffixed
        jne     .NOK
        dec     si
.OK:    ;;clc
        pop     ax                      ; (1a) This throws away `push si`
        ret                             ; CF=0
.NOK:   stc
        pop     si                      ; (1b)
        ret                             ; CF=1
; --------------------------------------

A word on segment registers

The ORG 256 directive on top tells you that this program is a .COM program for DOS where the segment registers are all set equal to each other. If you were to use the InputEAX routine in an .EXE program that you write, you would have to temporarily set the DS segment register equal to SS because the local buffers have been placed on the stack and normally SS will be different from DS.

; IN () OUT (eax,CF)
InputEAX:
        push    ds
        push    ss                      ; DS = SS
        pop     ds
        xor     eax, eax                ; In case of CF=1 on exit
        pushad

        ...

        popad
        pop     ds
        ret