Multiplication Using Addition and Bit Shifts

Discussion in 'Tutorials' started by Devon, Oct 1, 2023.

  1. Devon

    Devon A̸ ̴S̴ ̵C̵ ̷E̶ ̸N̸ ̴D̶ ̵E̶ ̸D̶ Member

    Joined:
    Aug 26, 2013
    Messages:
    1,377
    Location:
    your mom
    The MULU and MULS instructions are known to be quite slow. They require at least 38 cycles to execute, with the worst case being 70 cycles. If you are multiplying with rather basic and small numbers, this isn't ideal. So, with this guide, I can show you an alternative way to perform multiplication, by just using addition and bit shifts.

    The best way I can visualize this is basically how you can take the number 1 and expand it in a way that matches the multiplier. The easiest way I can show this is in binary.

    Let's say we want to multiply by 10 ($0A). In binary that is 1010. So, how can we take the number 1 and expand it into that? Lemme show you...

    First, let's start off with our number 1, and let's bit shift it to the left 1 time.
    Code:
    0001 -> 0010
    If you look closely, you can see that one of the bits in $0A now matches up.

    0010
    1010

    So, how can we get the other bit set? Well, what we can do is save our current iteration for later, and keep going. Let's now bit shift to the left twice.
    Code:
    0010 -> 1000
    Saved: 0010
    Well, now the other bit is set, but the first bit we had is now clear, so what gives?

    1000
    1010

    This is where the magic happens. Remember the 0010 we saved? Well, let's add it back!
    Code:
    1000 -> 1010
    And hey look, it's our multiplier! This method basically consists of a series of bit shifts and saving and re-adding to get our product, thought of with the example of multiplying 1 by the multiplier. As you go on, you basically save an iteration whenever you match a bit in your multiplier, and then you add it back on later.

    Let's do another example. Let's say we want to multiply by 13 ($0D). In binary that is 1101.

    Well, from the get go, with our number 1, we can see that the lowest bit matches, so we'll put that to the side.

    0001
    1101

    Now let's shift it to the left 2 times.
    Code:
    0001 -> 0100
    Saved: 0001
    We now get our next match.

    0100
    1101

    What we can do in this instance is add this iteration to our previously saved iteration. So that makes our saved iteration 0101. Now let's do 1 more bit shift to the left.
    Code:
    0100 -> 1000
    Saved: 0101
    We get our final match.

    1000
    1101

    Now let's add our saved iteration (0101) back in, and voila, it's 1101, our multiplier.

    We can implement the first example in m68k assembly like so, assuming we want to multiply as a 16-bit value, with d0 as our multiplicand.
    Code:
        add.w   d0,d0         ; 4 ; First bit shift
        move.w  d0,d1         ; 4 ; Save iteration
        add.w   d0,d0         ; 4 ; 2 more bit shifts
        add.w   d0,d0         ; 4
        add.w   d1,d0         ; 4 ; Add saved iteration
                              ; 20 cycles total
    
    And the second.
    Code:
        move.w  d0,d1         ; 4 ; Save iteration
        add.w   d0,d0         ; 4 ; 2 bit shifts
        add.w   d0,d0         ; 4
        add.w   d0,d1         ; 4 ; Add to saved iteration
        add.w   d0,d0         ; 4 ; 1 more bit shift
        add.w   d1,d0         ; 4 ; Add saved iteration
                              ; 24 cycles total
    And as we can also see, these 2 processes are faster than the best case scenario for either MULU or MULS.

    Disclaimers

    It should be noted that this should be mainly reserved for more basic multiplications. The more bit shifts and additions you need to do, the slower it gets, and the more instructions you need to store. If you are multiplying with a rather large and complex number, then this method would be worse than just using MULU or MULS. Use this method wisely! This is not a catch-all solution for slow multiplication. If using MULU or MULS is faster, then use them.

    If you are dealing with negative numbers, then this can work, but be aware of any sign extensions you might need to deal with, depending on the sizes you work with. It's also much much faster to multiply with the absolute value of the multiplier and then negate the result than it is to attempt to do a series of shifts and additions to match the negative multiplier.

    I also do 2 additions for 2 consecutive bit shifts, because doing 2 separate additions on bytes and words is faster than a single bit shift instruction (8 cycles vs 10). However, if you are doing more than 2 shifts or are working with longwords, then using a bit shift instruction is the faster option (using bit shift instructions increases by 2 cycles per shift, vs. at least 4 for each addition).
     
    Last edited: Oct 1, 2023
    Hame, ProjectFM, Arandomguy and 2 others like this.