Search code examples
assemblyx86masmirvine32

How to remove all punctuation and spaces in a string?


I have input like this:

This is, ,,, *&% a ::; demo + String. +Need to**@!/// format:::::!!! this.`

Output Required:

ThisisademoStringNeedtoformatthis

I have to do this without using str_trim.

Edit: I am writing an encryption program. I have to remove all punctuation from the string and turn all lower case letters to uppercase before I encrypt it.

I added the code. I need to remove the spaces, or any punctuation before I turn it to upper case. So far I haven't found anything in my book that could help with this except str_trim which we aren't allowed to use.

INCLUDE Irvine32.inc

.data
source  byte  "This is the source string",0

.code
main proc


mov  esi,0              ; index register
mov  ecx,SIZEOF source  ; loop counter
L1:
mov  al,source[esi]     ; get a character from source
and  source[esi], 11011111b     ; convert lower case to upper case
inc  esi                ; move to next character
loop L1                 ; repeat for entire string

mov edx, OFFSET source
call WriteString

exit
main endp
end main

Solution

  • Your are already trying to change from lowercase to uppercase, so, I will give you a hand to remove the punctuation. Next code uses my suggestion : moving the uppercase letters to an auxiliary string ignoring the punctuation characters. I used EMU8086 compiler :

    .stack 100h
    .data
    source  db  "STRING, WITH. PUNCTUATION : AND * SPACES!$"
    aux     db  "                                          "
    .code
      mov  ax, @data
      mov  ds, ax
    
    ;REMOVE EVERYTHING BUT UPPERCASE LETTERS.
    
      mov  si, offset source   ; POINT TO STRING.
      mov  di, offset aux      ; POINT TO AUXILIARY.
    L1:
      mov  al, [ si ]          ; get character from source
    ;CHECK IF END STRING ($).
      cmp  al, '$'
      je   finale
    ;CHECK IF CHAR IS UPPERCASE LETTER.
      cmp  al, 65
      jb   is_not_a_letter    ; CHAR IS LOWER THAN 'A'.
      cmp  al, 90
      ja   is_not_a_letter    ; CHAR IS HIGHER THAN 'Z'.
    ;COPY LETTER TO AUX STRING.
      mov  [ di ], al
      inc  di                ; POSITION FOR NEXT CHARACTER.
    is_not_a_letter:
      inc  si                ; move to next character
      jmp  L1
    
    finale:
      mov  [ di ], al        ; '$', NECESSARY TO PRINT.
    
    ;PRINT STRING.  
      mov  dx, OFFSET aux
      mov  ah, 9
      int  21h
    
    ;END PROGRAM.
      mov  ax, 4c00h
      int  21h              
    

    I ended the strings with '$' because I print the string with int 21h.

    As you can see, I'm not using CX nor the LOOP instruction. What I do is to repeat until '$' is found. You can do the same until 0 is found.