Floating point numbers and the effect on 8-bit microcontrollers memory

I am currently working on a project that includes bare-metal programming on an stm-8 micro-controller using the SDCC compiler in linux. The memory in the chip is quite low so I'm trying to keep things really lean. I have gotten by with using 8-bit and 16-bit variables and things have gone well. But recently I ran into a problem were I really needed a float variable. So i wrote a function that takes in a 16-bit value converts to a float does the math I need and returns an 8-bit number. This cause my final compiled code on the MCU to go from 1198 Bytes to 3462 Bytes. Now I understand that using floating points is memory intensive and that many functions may need to be called to handle the use of the floating point number but it seems crazy to increase the size of the program by that much. I would like some help understanding why this is and what happened exactly.

Specs: MCU stm8151f2 Compiler: SDCC with --opt_code_size option

int roundNo(uint16_t bit_input) 
{ 
    float num = (((float)bit_input) - ADC_MIN)/124.0;
    return num < 0 ? num - 0.5 : num + 0.5; 
}

Solution

To determine why the code is so large on your particular tool chain, you would need to look at the generated assembly code, and see what FP support calls it makes, then look at the map file to determine the size of each of those functions.

As an example on Godbolt for AVR using GCC 5.4.0 with -Os (Godbolt does not support STM8 or SDCC so this is for comparison as a 8-bit architecture) your code generates 6364 bytes compared 4081 bytes for an empty function. So the additional code required for the code body is 2283 bytes. Now accounting for the fact that you are using both a different compiler and architecture, these are not that different from your results. See in the generated code (below) the rcalls to subroutines such as __divsf3 - these are where the bulk of the code will be, and I suspect FP division is by far the larger contributor.

roundNo(unsigned int):
        push r12
        push r13
        push r14
        push r15
        mov r22,r24
        mov r23,r25
        ldi r24,0
        ldi r25,0
        rcall __floatunsisf
        ldi r18,0
        ldi r19,0
        ldi r20,0
        ldi r21,lo8(69)
        rcall __subsf3
        ldi r18,0
        ldi r19,0
        ldi r20,lo8(-8)
        ldi r21,lo8(66)
        rcall __divsf3
        mov r12,r22
        mov r13,r23
        mov r14,r24
        mov r15,r25
        ldi r18,0
        ldi r19,0
        ldi r20,0
        ldi r21,0
        rcall __ltsf2
        ldi r18,0
        ldi r19,0
        ldi r20,0
        ldi r21,lo8(63)
        sbrs r24,7
        rjmp .L6
        mov r25,r15
        mov r24,r14
        mov r23,r13
        mov r22,r12
        rcall __subsf3
        rjmp .L7
.L6:
        mov r25,r15
        mov r24,r14
        mov r23,r13
        mov r22,r12
        rcall __addsf3
.L7:
        rcall __fixsfsi
        mov r24,r22
        mov r25,r23
        pop r15
        pop r14
        pop r13
        pop r12
        ret

You need to perform the same analysis on the code generated by your tool chain to answer your question. No doubt SDCC is capable of generating an assembly listing and a map file which will allow you to determine exactly what code and FP support is being generated and linked.

Ultimately though your use of FP in this case is entirely unnecessary:

int roundNo(uint16_t bit_input) 
{ 
  int s = (bit_input - ADC_MIN) ;
  s += s < 0 ? -62 : 62 ;
  return s / 124 ;
}

At Godbolt 2283 bytes compared to an empty function. Still somewhat large, but the issue there most likely is that the AVR lacks a DIV instruction so calls __divmodhi4. STM8 has a DIV for 16 bit dividend and 8 bit divisor, so it will likely be significantly smaller (and faster) on your target.