Search code examples
c++bit-manipulationbitbit-shift

How do I efficiently left-shift by N bits using single-bit shifts?


Some CPUs like the MSP430 don't have multi-bit shifts, but only single-bit shifts or rotate instructions. This makes me curious about how programmers in the "olden days" implemented multi-bit shifts, when all they could do is shift one bit at a time.

I am aware of a "dumb" way of doing it, which is this:

#include <cstdint>

uint64_t lshift(uint64_t x, uint64_t shift) {
    for (uint64_t i = 0; i < shift; ++i) {
        x <<= 1;
    }
}

Is there any way of doing it which does not have O(n) complexity? Or is there at least an implementation that makes it possible if I know the shift at compile time, which is often the case with bit-shifts?

My intuition is that x << (1 << (1 << 1)) is the same as x << 4, so maybe one could reduce it down to O(log n) by combining shifts like that.

Edit

My intuition was wrong, but other operations could produce a similar effect. x << 1 is equivalent to x += x so x += x, x += x, x += x is equivalent to x << 4. Multiplication with powers of two would also work.


Note: C++ is only being used for the sake of convenience here, I know that there will always be a left-shift operator. I just don't want to think about this in MP430 assembly.


Solution

  • TL;DR: Indeed shifts by multiple steps would generally be done by multiple shifts as you can imagine. But some tricks can be used to avoid shifting too many times. For example most common algorithms in embedded systems require only shifts by 1, or can be redesign to shift by 1 in each loop iteration. In case a bigger shift is required then some special bitwise instructions in the ISA can be used for optimization

    Programming for an embedded system requires deeper understanding about that architecture in order to achieve good performance and RAM/ROM usage. For example variables are chosen so that they fit in the machine word. No one would use uint64_t like that on a 16 or 8-bit MCU unless absolutely necessary. Operations on multi-word variables (include bit shifts) require much more instructions so they won't usually be inlined. In general for shifts that are a multiple of a word size are done similarly to shifting array elements so they'll be very fast


    Shifting by an arbitrary variable requires a barrel shifter which takes far more die space than a simple shift register, therefore most embedded architectures can only shift one bit at a time. Most of them also include some kind of swap halfword instruction to overcome the limitation and allow fast enough large-distance shifting. For example 8-bit microcontrollers like 8051, PIC or AVR has a swap-nibble instruction. See:

    The MSP430 is a 16-bit MCU so it has a SWPB instruction to swap bytes which can be used similarly for fast shifting by 8. Here are some examples generated by Clang (with comments by me, notice how the shift by 8 and shifts larger than 8 are done):

    shift_left_15(unsigned short):                     ; @shift_left_15(unsigned short)
            mov.b   r12, r12
            swpb    r12      ; swap bytes then shift left 7 times
            add     r12, r12
            add     r12, r12
            add     r12, r12
            add     r12, r12
            add     r12, r12
            add     r12, r12
            add     r12, r12
            ret
    shift_left_12(unsigned short):                     ; @shift_left_12(unsigned short)
            mov.b   r12, r12
            swpb    r12      ; swap bytes then shift left 4 times
            add     r12, r12
            add     r12, r12
            add     r12, r12
            add     r12, r12
            ret
    shift_left_10(unsigned short):                     ; @shift_left_10(unsigned short)
            mov.b   r12, r12
            swpb    r12      ; swap bytes then shift left 2 times
            add     r12, r12
            add     r12, r12
            ret
    shift_left_9(unsigned short):                      ; @shift_left_9(unsigned short)
            mov.b   r12, r12
            swpb    r12
            add     r12, r12
            ret
    shift_left_8(unsigned short):                      ; @shift_left_8(unsigned short)
            mov.b   r12, r12
            swpb    r12      ; just swap bytes
            ret
    shift_left_7(unsigned short):                      ; @shift_left_7(unsigned short)
            add     r12, r12
            add     r12, r12
            add     r12, r12
            add     r12, r12
            add     r12, r12
            add     r12, r12
            add     r12, r12
            ret
    shift_left_3(unsigned short):                      ; @shift_left_3(unsigned short)
            add     r12, r12
            add     r12, r12
            add     r12, r12
            ret
    

    You can open the Godbolt link above for the full output

    If you're on MSP430X then it has the ability to shift from 1 to 4 bit position which greatly simplifies the shifting procedure

    shift_left_15(unsigned short):
            PUSHM.W #1, R4
            MOV.W   R1, R4
            rpt     #15 { rlax.w      R12
            POPM.W  #1, r4
            RET
    shift_left_12(unsigned short):
            PUSHM.W #1, R4
            MOV.W   R1, R4
            rpt     #12 { rlax.w      R12
            POPM.W  #1, r4
            RET
    shift_left_10(unsigned short):
            PUSHM.W #1, R4
            MOV.W   R1, R4
            rpt     #10 { rlax.w      R12
            POPM.W  #1, r4
            RET
    shift_left_9(unsigned short):
            PUSHM.W #1, R4
            MOV.W   R1, R4
            rpt     #9 { rlax.w       R12
            POPM.W  #1, r4
            RET
    shift_left_8(unsigned short):
            PUSHM.W #1, R4
            MOV.W   R1, R4
            rpt     #8 { rlax.w       R12
            POPM.W  #1, r4
            RET
    shift_left_7(unsigned short):
            PUSHM.W #1, R4
            MOV.W   R1, R4
            rpt     #7 { rlax.w       R12
            POPM.W  #1, r4
            RET
    shift_left_3(unsigned short):
            PUSHM.W #1, R4
            MOV.W   R1, R4
            rpt     #3 { rlax.w       R12
            POPM.W  #1, r4
            RET
    

    Right shift can be done in the same way but replace add with rrc/rra and rlax with rrax. See demo on Godbolt