Search code examples
assemblyx86-16tasm

Counting the occurrences of a specific char in a string


I'm trying to make a hybrid program (C++ and ASM) which does this task, but my ASM module is not working as intented. The text and a character are loaded in C++ part of the program. I'm not even sure where am I making mistakes here.

EDIT: I was using Borland C++ and TASM compiler (DosBOX). The program either showed wrong number of occurences (result was the same despite the actual number of occurences, but was changing with seemingly unrelated changes I made to the program), or spewed out weird characters (like down-arrow symbol or a smiley face) in place where a number should be - this was caused by setting incorrect variable type in C++.

The actual problem was most definitely caused by changing values of some registers, which apparently have to remain the same before and after the procedure (as David Wohlferd and many others have pointed to me - thank you all). I looked into the files given to us by our tutors, and according to them those registers are (for C/C++): DS, SS, SP, BP, SI, DI, and flag register if Direction Flag is modified in the procedure.

Here's the corrected code which works: https://pastebin.com/JPxMxzmK

.MODEL          SMALL, C
.STACK          400h
.DATA
.CODE
public          CountChar

CountChar   PROC
                push            bp
                mov         bp, sp
                xor         bx, bx
                mov         si, [bp+4]
                mov         ah, [bp+6]

                Check:
                mov         dl, [si]
                cmp         dl, 0
                je          EndOfP
                cmp         dl, ah
                je          Increasing
                inc         si
                jmp         Check

                Increasing:
                inc         bx
                inc         si
                jmp         Check

                EndOfP:
                mov         ax, bx
                pop         bp
                ret

CountChar   ENDP

END

Solution

  • You haven't actually said what's going wrong. That makes it hard to be sure what the 'answer' might be. But I'm going to take a crack at it (hey, I need the karma).

    Reading your code, there doesn't seem to be anything actually "wrong" with the code (although there are a few things I would do differently). However, there are some rules that assembler must follow if it is going to interact with C. One of the most important is that if you change certain registers, you are responsible for putting them back the way you found them. Your code is breaking this rule.

    As a newbie, this might seem a bit confusing to you. After all, it's not like your C code uses registers, right? Except that your C code does use registers. In fact, that's basically the entire purpose of a C compiler: Turning C code (which doesn't use registers) into assembler code (which does).

    If we could see the assembler code being generated for the code that calls CountChar, we'd see 2 push statements (putting the parameters on the stack), followed by a call CountChar. But the calling code is (probably) using some of the other registers (like si) to hold other values. Your CountChar routine must not stomp on those values, or strange things will happen when CountChar exits.

    You might ask: Why doesn't the calling routine save the values of ALL the registers before it calls your code? It could. But saving all the registers (and restoring them all) every time you call a function would really slow things down. And it's entirely possible that the routine you are calling doesn't even use all the registers, which would make it a waste of time for no benefit.

    Instead, the people who decide these things went for a compromise: When calling a function, the caller will assume that certain registers are unchanged when the function returns. Exactly which registers can change a little bit depending on how the function is defined. You may not have bumped into it yet, but there are several sets of rules that code routinely use (cdecl, stdcall, pascal, fastcall, etc).

    As Raymond says, for 16bit code, cdecl says that bp, si and di (as well as DS, but we won't go there) must be preserved by the callee. When you write C code, this is all done for you. But when you write assembler, you must know (and follow) these rules.

    That doesn't mean you can't use these registers. Just that if you do, you must save the old value (for example with push si) and restore it (for example with pop si) before your function exits. Of course doing a push/pop isn't free, so you'd probably want to use all the other registers first before using one of the ones that have to be saved/restored.

    Since this sounds like a homework assignment, I'm not going to post the re-worked code (I don't have the environment to run it anyway), but I'm going to give you a few suggestions to consider:

    1. Instead of using the si (which must be preserved), use cx (which doesn't).
    2. Instead of using bx to hold the count (and then moving the value into ax), just use ax to hold the count in the first place. You can use bx to hold the character you want to search for.
    3. When testing to see if a register is zero, it's (slightly) faster to use test dl, dl than to use cmp dl, 0.
    4. Looking at this chunk of code:

      cmp         dl, ah
      je          Increasing
      inc         si
      jmp         Check
      
      Increasing:
      inc         bx
      inc         si
      jmp         Check
      

      What would happen if you moved the inc si up before the cmp instruction? Then you wouldn't have to have it in 2 places:

      inc         si
      cmp         dl, ah
      je          Increasing
      jmp         Check
      
      Increasing:
      inc         bx
      jmp         Check
      

      But look at what has happened. Now we have 2 jump instructions right next to each other. Doesn't that seem a bit unnecessary? What if instead of jumping to Increasing on je, you were to jump to Check if jne? Now your code looks like this:

      inc         si
      cmp         dl, ah
      jne         Check
      
      inc         bx
      jmp         Check
      
    5. No review of assembler code would be complete without saying something about comments. This is a small bit of code, and it's just an exercise. But you should still get in the habit:

      inc         si      ; Position to next byte
      cmp         dl, ah  ; Is this the byte we are counting?
      jne         Check
      
      inc         bx      ; Found one
      jmp         Check
      

      Even trivial comments like this make the code way easier to follow.

      When you come back to this code months (or years) from now, or when someone else has to pick up your code and try to understand it (like at least 3 people did with your code today), it will make life easier. Even (especially?) if the code is wrong, putting the comment shows your intent/expectation.

    That's the best answer I can make with the info you have provided.