Search code examples
assemblyintegernumbersdosx86-16

Inputting multi-radix multi-digit signed numbers with DOS


My 8086 assembly program draws a nice diamond of white smiling faces that have a particular attribute specifying the foreground and background colors. I have hardcoded the ASCII character code and the color attribute but I would like the user to input these values from the keyboard. I searched the DOS api but could not find a single function that allows me to input a number. What can I do?
It would be nice to be able to input the attribute as an hexdecimal number!
Neither how to accept input with a large (multi-digit) number nor How to input from a user multi digit number in assembly? seem to address the issue in full.

    ORG  256

    mov  ax, 0003h       ; BIOS.SetVideoMode 80x25 text
    int  10h
    mov  dx, 0127h       ; DH = 1 (Row), DL = 39 (Colunn)
    mov  cx, 2           ; Replication count
    mov  bh, 0           ; Display page
    mov  bl, [Attribute]

a:  mov  ah, 02h         ; BIOS.SetCursorPosition
    int  10h
    mov  al, [Character]
    mov  ah, 09h         ; BIOS.WriteCharacterAndAttribute
    int  10h
    add  dx, 00FEh       ; SIMD, DH += 1 (Row), DL -= 2 (Column)
    add  cx, 4           ; Row of brick gets longer
    cmp  cx, 2+(4*11)
    jb   a

b:  mov  ah, 02h         ; BIOS.SetCursorPosition
    int  10h
    mov  al, [Character]
    mov  ah, 09h         ; BIOS.WriteCharacterAndAttribute
    int  10h
    add  dx, 0102h       ; SIMD, DH += 1 (Row), DL += 2 (Column)
    sub  cx, 4           ; Row of brick gets shorter
    jnb  b

    mov  ax, 4C00h       ; DOS.TerminateWithExitcode
    int  21h

Character db 1           ; WhiteSmilingFace
Attribute db 2Fh         ; BrightWhiteOnGreen

Solution

  • DOS has several input functions but all deal with characters exclusively.

    If the number involved is small, like say 1 or 2 digits, many (new) programmers use the DOS.GetCharacter function 01h resulting in code like this:

        ; 1-digit number
        mov  ah, 01h        ; DOS.GetCharacter
        int  21h            ; -> AL=["0","9"]
        sub  al, "0"        ; -> AL=[0,9]
    
        ; 2-digit number
        mov  ah, 01h        ; DOS.GetCharacter
        int  21h            ; -> AL=["0","9"] (tens)
        mov  bl, al
        mov  ah, 01h        ; DOS.GetCharacter
        int  21h            ; -> AL=["0","9"] (ones)
        mov  ah, bl
        sub  ax, "00"       ; SIMD -> AH=[0,9] (tens), AL=[0,9] (ones)
        aad                 ; AL = AH * 10 + AL -> AL=[0,99]
    

    This is the most basic way of inputting small numbers, but it lacks in many ways. As an example, consider what would happen to your program if the user made a mistake and accidently pressed a key for which DOS returns an extended ASCII character (a zero followed by a scancode).

    Then think about the mess you would get if the above method were used to input numbers that have 3, 4, or 5 digits! Inputting a multi-digit number is best done using the DOS.BufferedInput function 0Ah. This function already gives your program a better chance at surviving since it allows keyboard users to correct their mistakes. To allow for an input of at most 5 decimal digits, the buffer that you submit to DOS could be defined with buf db 6, 0, 6 dup 0. How buffered input works has the details. Once the string of characters that represent the number has been entered, the text must get converted into a numeric value. Next code shows this:

    snippet 1a

        mov  dx, buf
        mov  ah, 0Ah        ; DOS.BufferedInput
        int  21h
        xor  ax, ax         ; Result = 0
        mov  si, buf+1
        xor  cx, cx
        mov  cl, [si]       ; -> CX is number of characters entered
        jcxz .z             ; Return zero for an 'empty' input
        ; Decimal
    .a: inc  si             ; Next character
        mov  dx, 10
        mul  dx             ; Result = Result * 10
        mov  dl, [si]       ; -> DX = ["0","9"] (NewDigit)
        sub  dl, 48         ; Convert NewDigit from ["0","9"] to [0,9]
        add  ax, dx         ; Result = Result + NewDigit
        loop .a
    .z:
    

    Sometimes you will want to input numbers in the hexadecimal, octal, or binary formats, in which case you could use next calculation loops:

    snippet 2a

        ; Hexadecimal
    .a: inc  si             ; Next character
        shl  ax, 1          ; Result = Result * 16
        shl  ax, 1
        shl  ax, 1
        shl  ax, 1
        mov  dl, [si]       ; -> DL = {["0","9"],["A","F"]} (NewDigit)
        cmp  dl, "9"
        jbe  .b
        sub  dl, 7
    .b: sub  dl, 48
        or   al, dl         ; Result = Result + NewDigit
        loop .a
    
        ; Octal
    .a: inc  si             ; Next character
        shl  ax, 1          ; Result = Result * 8
        shl  ax, 1
        shl  ax, 1
        mov  dl, [si]       ; -> DL = ["0","7"] (NewDigit)
        sub  dl, 48
        or   al, dl         ; Result = Result + NewDigit
        loop .a
    
        ; Binary
    .a: inc  si             ; Next character
        cmp  byte [si], "1" ; -> CF=1 for "0", CF=0 for "1"
        cmc                 ; -> CF=0 for "0", CF=1 for "1"
        rcl  ax, 1          ; Result = Result * 2 + NewDigit
        loop .a
    

    Even with the editing facilities that the DOS.BufferedInput function 0Ah offers it is not ok to just trust the user at the keyboard to supply your program the correct data. It is you that has to validate the input, and if you find that something is amiss, there're a number of ways to deal with it. You could exit the program with (or without) an error message, you could have the user redo the input, you could choose to deliver some special value like the '8000h integer indefinite' that the FPU uses, or you could return a saturated result. The important thing is that you deal with the situation.

    Building a better number input routine

    To improve on the code that we have so far, we could

    • write the code such that the user can freely choose the number base that they want to use. All it will take is allowing the input to contain an additional numeric affix. I have always preferred the one character suffixes that Intel uses, so 'h' for hexadecimal, 'o' for octal, 'b' for binary, and 'd' or none for decimal.

    • add a further suffix in order to shorten long numbers that are multiples of 1000 ('K' for Kilo) or 1024 ('KB' for KiloByte). eg. 60K is 60000 and 6KB is 6144

    • allow the user to use the so-called 'thousands separator', and make long runs of digits become easier to read/write. A nice thing about it, is that it need not separate at the thousands at all! We can apply this to any of the number bases. FASM uses the apostrophe ' for this.

    • allow the user to use any case for the suffixes and the hexadecimal digits A through F, making the text case-insensitive.

    • allow the user to have leading whitespace in their input. Sounds silly? Well not so much if you have your inputs stored in an history of some kind, and later recall that list. You would appreciate the nice right alignment that you could get.

    • allow the user to have trailing whitespace in their input. Ask yourself whether you'd hate the program to disapprove of an input like 123 or even 20 years.

    • allow the user to prefix the number with a minus sign -, so that they can start working with negative numbers in their code.

    • extend the range of numbers that we can process. Instead of storing the result in the 16-bit AX register, we will store it in the 32-bit EAX register. If the code is to run on the 8086 cpu, then we would store in the 32-bit DX:AX register pair!

    but we must

    • verify that the input is composed of valid characters so as to not spend effort processing garbage

    • detect numeric overflow so as to not deliver bogus results to the program

    Applying validation and overflow detection turns snippet 1a into

    snippet 1b

        mov  dx, buf
        mov  ah, 0Ah        ; DOS.BufferedInput
        int  21h
        xor  ax, ax         ; Result = 0
        mov  si, buf+1
        xor  cx, cx
        mov  cl, [si]       ; -> CX is number of characters entered
        jcxz .z             ; Return zero for an 'empty' input
        ; Decimal
    .a: inc  si             ; Next character
        xor  bx, bx
        mov  bl, [si]       ; -> BX = ["0","9"] (NewDigit) ?
        sub  bl, 48         ; Convert NewDigit from ["0","9"] to [0,9]
        cmp  bl, 9
        ja   .z             ; Stop if not a digit
        mov  dx, 10
        mul  dx             ; Result = Result * 10
        jc   .o
        add  ax, bx         ; Result = Result + NewDigit
        jc   .o
        loop .a
        jmp  .z
    .o: mov  ax, 65535      ; Saturated result is MAXUINT
    .z:
    

    For the hexadecimal, octal, or binary formats, substitute next loops:

    snippet 2b

        ; Hexadecimal
    .a: inc  si             ; Next character
        mov  dl, [si]       ; -> DL = {["0","9"],["A","F"]} (NewDigit) ?
        cmp  dl, "9"
        jbe  .b
        sub  dl, 7
    .b: sub  dl, 48
        cmp  dl, 15
        ja   .z             ; Stop if not a digit
        rol  ax, 1          ; Result = Result * 16
        rol  ax, 1
        rol  ax, 1
        rol  ax, 1
        test al, 15
        jnz  .o
        or   al, dl         ; Result = Result + NewDigit
        loop .a
    
        ; Octal
    .a: inc  si             ; Next character
        mov  dl, [si]       ; -> DL = ["0","7"] (NewDigit) ?
        sub  dl, 48
        cmp  dl, 7
        ja   .z             ; Stop if not a digit
        rol  ax, 1          ; Result = Result * 8
        rol  ax, 1
        rol  ax, 1
        test al, 7
        jnz  .o
        or   al, dl         ; Result = Result + NewDigit
        loop .a
    
        ; Binary
    .a: inc  si             ; Next character
        mov  dl, [si]       ; -> DL = ["0","1"] (NewDigit) ?
        sub  dl, 48
        cmp  dl, 1
        ja   .z             ; Stop if not a digit
        shl  ax, 1          ; Result = Result * 2
        jc   .o
        or   al, dl         ; Result = Result + NewDigit
        loop .a
    

    The de luxe version of inputting a number applies everything that was mentioned above. It is important to note that next program will not run on a 8086 cpu because it uses 32-bit registers and instructions introduced with later processors. (For sure a nice exercise to rewrite for 8086!) The program runs in a DOS window, and of course also in the true real address mode of an x86 cpu.
    The InputEAX routine sets the carry flag if the input turns out to be syntactically wrong (EAX=0), or the input leads to a value that exceeds the 32-bit range [-(4GB-1),+(4GB-1)] (EAX=80000000h).
    This inputting code does not pretend to be gospel! If you don't need a certain feature, then just remove it. And if for your particular use case some feature is missing, then just add it. Leave a comment if this happens...

            ORG     256
    
    again:  mov     dx, msg1
            mov     ah, 09h                 ; DOS.PrintString
            int     21h
            call    InputEAX                ; -> EAX CF
            ; ignoring the CF for the purpose of the demo
            push    ax                      ; (1)
            mov     dx, msg2
            mov     ah, 09h                 ; DOS.PrintString
            int     21h
            pop     ax                      ; (1)
            call    PrintEAX
            cmp     eax, 27                 ; Arbitrarily chosen, but 27 == ESC
            jne     again
    
    exit:   mov     ax, 4C00h               ; DOS.TerminateWithReturnCode
            int     21h
    ; --------------------------------------
    msg1    db      13, 10, 'Input a number : $'
    msg2    db      10, 'The number is $'
    ; --------------------------------------
    ; IN (eax) OUT ()
    PrintEAX:
            pushad
            test    eax, eax
            jns     .a
            push    ax                      ; (1)
            mov     dl, "-"
            mov     ah, 02h                 ; DOS.PrintCharacter
            int     21h
            pop     ax                      ; (1)
            neg     eax
    .a:     mov     ebx, 10
            push    bx                      ; (2a) Sentinel
    .b:     xor     edx, edx
            div     ebx
            push    dx                      ; (2b) Remainder
            test    eax, eax
            jnz     .b
            pop     dx                      ; (2)
    .c:     add     dl, "0"
            mov     ah, 02h                 ; DOS.PrintCharacter
            int     21h
            pop     dx
            cmp     dx, bx
            jb      .c
            popad
            ret
    ; --------------------------------------
    ; IN () OUT (eax,CF)
    InputEAX:
            xor     eax, eax                ; In case of CF=1 on exit
            pushad
            sub     sp, 44+44               ; 2 local buffers
            mov     bp, sp
            push    44                      ; Buffer header 44, 0
            mov     dx, sp
            mov     ah, 0Ah                 ; DOS.BufferedInput
            int     21h
            mov     si, bp                  ; Where the string of characters begins
    
    ; Leading whitespace
    .a:     lodsb
            call    IsWhitespace            ; -> ZF
            je      .a
            dec     si
    
    ; Unary
            mov     al, [si]
            push    ax                      ; Possible UNARY at [bp-4]
            cmp     al, "+"
            je      .b
            cmp     al, "-"
            jne     .c
    .b:     inc     si
    
    ; Digits followed by base-suffix, in turn for Hex, Oct, Bin, and Dec
    .c:     mov     cx, 16+256*'H'
            call    GetDigits               ; -> SI DI CF (AX)
            jnc     .d
            mov     cx, 8+256*'O'
            call    GetDigits               ; -> SI DI CF (AX)
            jnc     .d
            mov     cx, 2+256*'B'
            call    GetDigits               ; -> SI DI CF (AX)
            jnc     .d
            mov     cx, 10+256*'D'
            call    GetDigits               ; -> SI DI CF (AX)
            jc      .NOK
    .d:     call    LodsUCasedChar          ; -> AL SI
    
    ; [option] K, M, G, KB, MB, GB order-suffixes
            mov     ebx, 1                  ; Multiplier
            mov     ch, 3                   ; ORDER
            cmp     al, "G"                 ; Giga
            je      .e
            mov     ch, 2                   ; ORDER
            cmp     al, "M"                 ; Mega
            je      .e
            mov     ch, 1                   ; ORDER
            cmp     al, "K"                 ; Kilo
            jne     .f
    .e:     mov     bx, 1000                ; Multiplier
            call    LodsUCasedChar          ; -> AL SI
            cmp     al, "B"
            jne     .f
            mov     bx, 1024                ; Multiplier
            lodsb
    
    ; Trailing whitespace or end-of-input
    .f:     call    IsWhitespace            ; -> ZF
            je      .OK
            cmp     al, 13                  ; Terminating carriage return
            je      .OK
    
    ; Failed to extract any series of digits, or excess characters in string
    .NOK:   stc
            jmp     .END
    
    ; Building the integer in EAX
    .OK:    mov     byte [bp+44+44+31], 80h ; pushad.EAX = 80000000h (Integer
            xor     si, si                  ;       indefinite in case of overflow)
            xor     eax, eax                ; Result
    .g:     movzx   edx, cl                 ; CL is RADIX {16,8,2,10}
            mul     edx
            jc      .END
            movzx   edx, byte [bp+44+si]    ; NewDigit [0,15]
            add     eax, edx
            jc      .END
            inc     si
            cmp     si, di                  ; DI is NumberOfDigits
            jb      .g
    
    ; [option] Applying the multipliers repeatedly
    .h:     mul     ebx                     ; EBX={1,1000,1024}
            jc      .END
            dec     ch                      ; CH is ORDER [1,3]
            jnz     .h
    
    ; Negating as required
            cmp     byte [bp-4], "-"        ; UNARY
            jne     .CLC
            neg     eax                     ; Valid range [-(4GB-1),+(4GB-1)]
    .CLC:   clc
    
    ; Returning the result
            mov     [bp+44+44+28], eax      ; pushad.EAX
    .END:   lea     sp, [bp+44+44]
            popad
            ret
    ; --------------------------------------
    ; IN (al) OUT (ZF)
    IsWhitespace:
            cmp     al, " "
            je      .a
            cmp     al, 9                   ; Tab
    .a:     ret
    ; --------------------------------------
    ; IN (si) OUT (al,si)
    LodsUCasedChar:
            lodsb
            cmp     al, "a"
            jb      .a
            cmp     al, "z"
            ja      .a
            and     al, 1101'1111b          ; UCase
    .a:     ret
    ; --------------------------------------
    ; IN (cx,si) OUT (si,di,CF) MOD (ax)
    GetDigits:
            push    si                      ; (1)
            xor     di, di                  ; NumberOfDigits
    .a:     call    LodsUCasedChar          ; -> AL SI
            cmp     al, "'"                 ; 'Thousands' separator (apostrophe)
            je      .a
            mov     ah, al
            cmp     al, "0"
            jb      .c
            cmp     al, "9"
            jbe     .b
            cmp     al, "A"
            jb      .c
            cmp     al, "F"
            ja      .c
            sub     al, 7
    .b:     sub     al, 48                  ; -> AL=[0,15]
            cmp     al, cl                  ; CL is RADIX {16,8,2,10}
            jnb     .c
            mov     [bp+44+di], al
            inc     di
            jmp     .a
    
    .c:     test    di, di                  ; Any digits found ?
            jz      .NOK
            cmp     ah, ch                  ; CH is BASE-SUFFIX {HOBD}
            je      .OK
            cmp     ch, "D"                 ; Decimals need not be suffixed
            jne     .NOK
            dec     si
    .OK:    ;;clc
            pop     ax                      ; (1a) This throws away `push si`
            ret                             ; CF=0
    .NOK:   stc
            pop     si                      ; (1b)
            ret                             ; CF=1
    ; --------------------------------------
    

    A word on segment registers

    The ORG 256 directive on top tells you that this program is a .COM program for DOS where the segment registers are all set equal to each other. If you were to use the InputEAX routine in an .EXE program that you write, you would have to temporarily set the DS segment register equal to SS because the local buffers have been placed on the stack and normally SS will be different from DS.

    ; IN () OUT (eax,CF)
    InputEAX:
            push    ds
            push    ss                      ; DS = SS
            pop     ds
            xor     eax, eax                ; In case of CF=1 on exit
            pushad
    
            ...
    
            popad
            pop     ds
            ret