Overhauling the CalcAngle routine

Discussion in 'Tutorials Archive' started by Nat The Porcupine, Aug 2, 2017.

  1. Nat The Porcupine

    Nat The Porcupine Point & Click Funny Man Member

    Joined:
    Jun 23, 2017
    Messages:
    71
    Location:
    Pennsyltucky
    Warning: This code actually has a pretty serious flaw that I didn't initially encounter in my testing. Until I can find a way to fix it, I'm going to have to recommend that you NOT follow this tutorial. I apologize for the inconvenience...

    Before we begin, I need to give a huge shout out to Johan Forslöf for his brilliant 6502 8-bit atan2 routine. It was a huge help and I can safely say that I would not be able to make this tutorial without it. Now that we've got that out of the way, lets get on with the tutorial!

    Preamble
    So, the way Sonic 1 handles angle calculations is incredibly CPU hungry, taking up to 402 cycles to calculate just one! And before any of you ask, no, the method used by Sonic 3&K is still terrible, clocking in (no pun intended) at a maximum of 382 cycles. This may not seem like much of a problem at first, but there are quite a few situations where the pure gluttony of this routine can really start to take its toll, and frankly, I'm getting sick and tired of it! So let's fix it...

    Step 1: Identifying the problem
    Below is the original CalcAngle routine from Sonic 1. See if you can name one of the problems here:

    Code:
    ; ---------------------------------------------------------------------------
    ; Subroutine calculate an angle
    
    ; input:
    ;    d1 = x-axis distance
    ;    d2 = y-axis distance
    
    ; output:
    ;    d0 = angle
    ; ---------------------------------------------------------------------------
    
    ; ||||||||||||||| S U B    R O U T    I N E |||||||||||||||||||||||||||||||||||||||
    
    
    CalcAngle:
            movem.l    d3-d4,-(sp)
            moveq    #0,d3
            moveq    #0,d4
            move.w    d1,d3
            move.w    d2,d4
            or.w    d3,d4
            beq.s    CalcAngle_Zero
            move.w    d2,d4
            tst.w    d3
            bpl.w    loc_2CC2
            neg.w    d3
    
    loc_2CC2:
            tst.w    d4
            bpl.w    loc_2CCA
            neg.w    d4
    
    loc_2CCA:
            cmp.w    d3,d4
            bcc.w    loc_2CDC
            lsl.l    #8,d4
            divu.w    d3,d4
            moveq    #0,d0
            move.b    Angle_Data(pc,d4.w),d0
            bra.s    loc_2CE6
    ; ===========================================================================
    
    loc_2CDC:
            lsl.l    #8,d3
            divu.w    d4,d3
            moveq    #$40,d0
            sub.b    Angle_Data(pc,d3.w),d0
    
    loc_2CE6:
            tst.w    d1
            bpl.w    loc_2CF2
            neg.w    d0
            addi.w    #$80,d0
    
    loc_2CF2:
            tst.w    d2
            bpl.w    loc_2CFE
            neg.w    d0
            addi.w    #$100,d0
    
    loc_2CFE:
            movem.l    (sp)+,d3-d4
            rts
    ; ===========================================================================
    
    CalcAngle_Zero:
            move.w    #$40,d0
            movem.l    (sp)+,d3-d4
            rts
    ; End of function CalcAngle
    
    ; ===========================================================================
    
    Angle_Data:    incbin    "misc\angles.bin"
    
    ; ===========================================================================
    Let's start with the more obvious problem; inappropriate branch sizes. For whatever reason, there are word-sized branch sizes in this routine when they should really be shorts. At first I thought the original programmer was just playing it safe, but then I noticed that the largest branch in the entire routine was a short, so in reality, they probably just mixed the two sizes up.

    So, why exactly is this an issue? You see, a lot of 68k reference manuals will tell you that a branch will always take 10 cycles regardless of the size. However this is only true if a branch is taken, and because we have conditional branching, that branch won't always be taken. If a word branch isn't taken, it takes 12 cycles for the program to move on, but if the branch skipped is a short, it will only take 8 cycles to get back on track. This exactly what S2, SCD, and S3&K did to speed up the routine a little, but as I said earlier, that's still not good enough.

    In short (more puns?), make sure you change all instances of "bpl.w" and "bcc.w" to "bpl.s" and "bcc.s"; it's a good start if nothing else.

    Step 2: Identifying the actual problem
    The real issue with the original code is its reliance on the "divu.w" instruction. For the unenlightened, divu is a built in 68k instruction that performs unsigned division. This instruction is PAINFULLY slow, taking about 140 cycles to finish on average. In addition, there's effectively a 10% difference in execution time thanks to the algorithm being used, so it may actually take up to 154 cycles if you feed it numbers that aren't favored by the algorithm. But we need to use division in order to get the y/x ratio, so what do we do? The answer is to use a lookup table of logarithms to get the y/x ratio instead. So let's get started...

    Step 3: Setting things up
    So first thing's first, your going to want to delete all the code from just below label loc_2CCA up to and including label loc_2CFE. Go ahead and delete the Angle_Data section too. Your code should now look like this:

    Code:
    ; ---------------------------------------------------------------------------
    ; Subroutine calculate an angle
    
    ; input:
    ;    d1 = x-axis distance
    ;    d2 = y-axis distance
    
    ; output:
    ;    d0 = angle
    ; ---------------------------------------------------------------------------
    
    ; ||||||||||||||| S U B    R O U T    I N E |||||||||||||||||||||||||||||||||||||||
    
    
    CalcAngle:
            movem.l    d3-d4,-(sp)
            moveq    #0,d3
            moveq    #0,d4
            move.w    d1,d3
            move.w    d2,d4
            or.w    d3,d4
            beq.s    CalcAngle_Zero            ; special case when both x and y are zero
            move.w    d2,d4
            tst.w    d3
            bpl.s    loc_2CC2
            neg.w    d3
    
    loc_2CC2:
            tst.w    d4
            bpl.s    loc_2CCA
            neg.w    d4
    
    loc_2CCA:
      
            movem.l    (sp)+,d3-d4
            rts
    ; ===========================================================================
    
    CalcAngle_Zero:
            move.w    #$40,d0
            movem.l    (sp)+,d3-d4
            rts
    ; End of function CalcAngle
    
    ; ===========================================================================

    Next, go ahead and move the CalcAngle_Zero section to the very top of your code (I'll explain later). Go ahead and turn that "move.w" to "moveq"; it'll save some space and a few cycles. Your code should now look like this:

    Code:
    ; ===========================================================================
    
    CalcAngle_Zero:
            moveq    #$40,d0                        ; this used to be move.w, but moveq is faster
            movem.l    (sp)+,d3-d4
            rts
    ; ===========================================================================
    
    
    ; ---------------------------------------------------------------------------
    ; Subroutine calculate an angle
    
    ; input:
    ;    d1 = x-axis distance
    ;    d2 = y-axis distance
    
    ; output:
    ;    d0 = angle
    ; ---------------------------------------------------------------------------
    
    ; ||||||||||||||| S U B    R O U T    I N E |||||||||||||||||||||||||||||||||||||||
    
    
    CalcAngle:
            movem.l    d3-d4,-(sp)
            moveq    #0,d3
            moveq    #0,d4
            move.w    d1,d3
            move.w    d2,d4
            or.w    d3,d4
            beq.s    CalcAngle_Zero            ; special case when both x and y are zero
            move.w    d2,d4
            tst.w    d3
            bpl.s    loc_2CC2
            neg.w    d3
    
    loc_2CC2:
            tst.w    d4
            bpl.s    loc_2CCA
            neg.w    d4
    
    loc_2CCA:
     
            movem.l    (sp)+,d3-d4
            rts
    ; End of function CalcAngle
    
    ; ===========================================================================
    Step 4: Fresh Code
    Alright, are you ready to add some new code? Good! Lets start by making the following additions:

    Code:
            ...
            move.w    d2,d4
            moveq    #0,d2                      ; ADD THIS
      
            tst.w    d3
            bpl.s    loc_2CC2
            neg.w    d3
            bset.l    #2,d2                        ; ADD THIS
    
    loc_2CC2:
            tst.w    d4
            bpl.s    loc_2CCA
            neg.w    d4
            bset.l    #1,d2                        ; ADD THIS

    The d2 register is used here as a bitfield of flags that will be used to adjust the final angle value to the appropriate half-quadrant, or more accurately, the correct "octant". Next, we should add the lookup tables for logarithms, angles, and octant adjustments. Copy and paste the following to the bottom of the routine:


    Code:
    ; ===========================================================================
    
    ; This table is used to place the angle in the appropriate octant (half-quadrant)
    Octant_Adjust:
            dc.b %00111111        ; x+,y+,|x|>|y|
            dc.b %00000000        ; x+,y+,|x|<|y|
            dc.b %11000000        ; x+,y-,|x|>|y|
            dc.b %11111111        ; x+,y-,|x|<|y|
            dc.b %01000000        ; x-,y+,|x|>|y|
            dc.b %01111111        ; x-,y+,|x|<|y|
            dc.b %10111111        ; x-,y-,|x|>|y|
            dc.b %10000000        ; x-,y-,|x|<|y|
    
    ; ===========================================================================
    
    ; Table of approximated values calculated by the following formula: atan(2^(x/32))*128/pi
    Atan_Tab:
            dc.b $00,$00,$00,$00,$00,$00,$00,$00
            dc.b $00,$00,$00,$00,$00,$00,$00,$00
            dc.b $00,$00,$00,$00,$00,$00,$00,$00
            dc.b $00,$00,$00,$00,$00,$00,$00,$00
            dc.b $00,$00,$00,$00,$00,$00,$00,$00
            dc.b $00,$00,$00,$00,$00,$00,$00,$00
            dc.b $00,$00,$00,$00,$00,$00,$00,$00
            dc.b $00,$00,$00,$00,$00,$00,$00,$00
            dc.b $00,$00,$00,$00,$00,$00,$00,$00
            dc.b $00,$00,$00,$00,$00,$00,$00,$00
            dc.b $00,$00,$00,$00,$00,$01,$01,$01
            dc.b $01,$01,$01,$01,$01,$01,$01,$01
            dc.b $01,$01,$01,$01,$01,$01,$01,$01
            dc.b $01,$01,$01,$01,$01,$01,$01,$01
            dc.b $01,$01,$01,$01,$01,$02,$02,$02
            dc.b $02,$02,$02,$02,$02,$02,$02,$02
            dc.b $02,$02,$02,$02,$02,$02,$02,$02
            dc.b $03,$03,$03,$03,$03,$03,$03,$03
            dc.b $03,$03,$03,$03,$03,$04,$04,$04
            dc.b $04,$04,$04,$04,$04,$04,$04,$04
            dc.b $05,$05,$05,$05,$05,$05,$05,$05
            dc.b $06,$06,$06,$06,$06,$06,$06,$06
            dc.b $07,$07,$07,$07,$07,$07,$08,$08
            dc.b $08,$08,$08,$08,$09,$09,$09,$09
            dc.b $09,$0a,$0a,$0a,$0a,$0b,$0b,$0b
            dc.b $0b,$0c,$0c,$0c,$0c,$0d,$0d,$0d
            dc.b $0d,$0e,$0e,$0e,$0e,$0f,$0f,$0f
            dc.b $10,$10,$10,$11,$11,$11,$12,$12
            dc.b $12,$13,$13,$13,$14,$14,$15,$15
            dc.b $15,$16,$16,$17,$17,$17,$18,$18
            dc.b $19,$19,$19,$1a,$1a,$1b,$1b,$1c
            dc.b $1c,$1c,$1d,$1d,$1e,$1e,$1f,$1f
    
    ; ===========================================================================
    
    ; Table of values calculated by this equation: log2(x)*32
    Log2_Tab:
            dc.b $00,$00,$20,$32,$40,$4a,$52,$59
            dc.b $60,$65,$6a,$6e,$72,$76,$79,$7d
            dc.b $80,$82,$85,$87,$8a,$8c,$8e,$90
            dc.b $92,$94,$96,$98,$99,$9b,$9d,$9e
            dc.b $a0,$a1,$a2,$a4,$a5,$a6,$a7,$a9
            dc.b $aa,$ab,$ac,$ad,$ae,$af,$b0,$b1
            dc.b $b2,$b3,$b4,$b5,$b6,$b7,$b8,$b9
            dc.b $b9,$ba,$bb,$bc,$bd,$bd,$be,$bf
            dc.b $c0,$c0,$c1,$c2,$c2,$c3,$c4,$c4
            dc.b $c5,$c6,$c6,$c7,$c7,$c8,$c9,$c9
            dc.b $ca,$ca,$cb,$cc,$cc,$cd,$cd,$ce
            dc.b $ce,$cf,$cf,$d0,$d0,$d1,$d1,$d2
            dc.b $d2,$d3,$d3,$d4,$d4,$d5,$d5,$d5
            dc.b $d6,$d6,$d7,$d7,$d8,$d8,$d9,$d9
            dc.b $d9,$da,$da,$db,$db,$db,$dc,$dc
            dc.b $dd,$dd,$dd,$de,$de,$de,$df,$df
            dc.b $df,$e0,$e0,$e1,$e1,$e1,$e2,$e2
            dc.b $e2,$e3,$e3,$e3,$e4,$e4,$e4,$e5
            dc.b $e5,$e5,$e6,$e6,$e6,$e7,$e7,$e7
            dc.b $e7,$e8,$e8,$e8,$e9,$e9,$e9,$ea
            dc.b $ea,$ea,$ea,$eb,$eb,$eb,$ec,$ec
            dc.b $ec,$ec,$ed,$ed,$ed,$ed,$ee,$ee
            dc.b $ee,$ee,$ef,$ef,$ef,$ef,$f0,$f0
            dc.b $f0,$f1,$f1,$f1,$f1,$f1,$f2,$f2
            dc.b $f2,$f2,$f3,$f3,$f3,$f3,$f4,$f4
            dc.b $f4,$f4,$f5,$f5,$f5,$f5,$f5,$f6
            dc.b $f6,$f6,$f6,$f7,$f7,$f7,$f7,$f7
            dc.b $f8,$f8,$f8,$f8,$f9,$f9,$f9,$f9
            dc.b $f9,$fa,$fa,$fa,$fa,$fa,$fb,$fb
            dc.b $fb,$fb,$fb,$fc,$fc,$fc,$fc,$fc
            dc.b $fd,$fd,$fd,$fd,$fd,$fd,$fe,$fe
            dc.b $fe,$fe,$fe,$ff,$ff,$ff,$ff,$ff
    
    ; ===========================================================================
    Lastly, go ahead and change all of the code below loc_2CCA to read like this:

    Code:
    loc_2CCA:
            moveq    #0,d1
            move.b    Log2_Tab(pc,d3.w),d1        ; logarithmic division is used to...
            sub.b    Log2_Tab(pc,d4.w),d1        ; ...find the y/x ratio instead of divu
            bcc.s    loc_2CDC
            neg.w    d1                            ; negate the value of d1
            bset.l    #0,d2                        ; sets the 'y is greater than x' flag
      
    loc_2CDC:
            moveq    #0,d0
            move.b    Atan_Tab(pc,d1.w),d0        ; get the angle value
            moveq    #0,d1
            move.b    Octant_Adjust(pc,d2.w),d1    ; get the octant adjustment value
            eor.b    d1,d0                        ; place the angle in the appropriate octant
            movem.l    (sp)+,d3-d4
            rts

    And that should be it! Just save, assemble, and... uh oh...

    Code:
    > > >sonic.asm: error: addressing mode not allowed on 68000
    > > > move.b    Log2_Tab(pc,d3.w),d1
    
    > > >sonic.asm: error: addressing mode not allowed here
    > > > move.b    Log2_Tab(pc,d3.w),d1
    
    > > >sonic.asm: error: addressing mode not allowed on 68000
    > > > sub.b    Log2_Tab(pc,d4.w),d1
    
    > > >sonic.asm: error: addressing mode not allowed here
    > > > sub.b    Log2_Tab(pc,d4.w),d1

    Ok, I lied, we're not actually done yet. I had you do this intentionally in order to teach you about the limitations of accessing lookup tables in this way. If your going to access lookup table like this, the starting address of a lookup table must be within the range of 127 bytes ahead or 128 bytes back from that point in the ROM. Basically, if you can't "bra.s" to the lookup table from the same point in ROM, then "move.b ROM_Address(pc,dn.w),dn" isn't going to work either. To counter this, we move the Log table closer to the lookup commands and branch past the table once we're done. Remember when I told you to move the CalcAngle_Zero section to the top of the routine? This is why; we wouldn't be able to preserve the short branch size with the Log table in the way, so that block was relocated to keep our routine nice and speedy. Anyway, after making that change, it'll look something like this:


    Code:
    loc_2CCA:
            moveq    #0,d1
            move.b    Log2_Tab(pc,d3.w),d1        ; logarithmic division is used to...
            sub.b    Log2_Tab(pc,d4.w),d1        ; ...find the y/x ratio instead of divu
            bra.w    loc_2CD0                    ; branch past the table and continue on
    ; ===========================================================================
    
    ; Table of values calculated by this equation: log2(x)*32
    Log2_Tab:
            dc.b $00,$00,$20,$32,$40,$4a,$52,$59
            dc.b $60,$65,$6a,$6e,$72,$76,$79,$7d
            dc.b $80,$82,$85,$87,$8a,$8c,$8e,$90
            dc.b $92,$94,$96,$98,$99,$9b,$9d,$9e
            dc.b $a0,$a1,$a2,$a4,$a5,$a6,$a7,$a9
            dc.b $aa,$ab,$ac,$ad,$ae,$af,$b0,$b1
            dc.b $b2,$b3,$b4,$b5,$b6,$b7,$b8,$b9
            dc.b $b9,$ba,$bb,$bc,$bd,$bd,$be,$bf
            dc.b $c0,$c0,$c1,$c2,$c2,$c3,$c4,$c4
            dc.b $c5,$c6,$c6,$c7,$c7,$c8,$c9,$c9
            dc.b $ca,$ca,$cb,$cc,$cc,$cd,$cd,$ce
            dc.b $ce,$cf,$cf,$d0,$d0,$d1,$d1,$d2
            dc.b $d2,$d3,$d3,$d4,$d4,$d5,$d5,$d5
            dc.b $d6,$d6,$d7,$d7,$d8,$d8,$d9,$d9
            dc.b $d9,$da,$da,$db,$db,$db,$dc,$dc
            dc.b $dd,$dd,$dd,$de,$de,$de,$df,$df
            dc.b $df,$e0,$e0,$e1,$e1,$e1,$e2,$e2
            dc.b $e2,$e3,$e3,$e3,$e4,$e4,$e4,$e5
            dc.b $e5,$e5,$e6,$e6,$e6,$e7,$e7,$e7
            dc.b $e7,$e8,$e8,$e8,$e9,$e9,$e9,$ea
            dc.b $ea,$ea,$ea,$eb,$eb,$eb,$ec,$ec
            dc.b $ec,$ec,$ed,$ed,$ed,$ed,$ee,$ee
            dc.b $ee,$ee,$ef,$ef,$ef,$ef,$f0,$f0
            dc.b $f0,$f1,$f1,$f1,$f1,$f1,$f2,$f2
            dc.b $f2,$f2,$f3,$f3,$f3,$f3,$f4,$f4
            dc.b $f4,$f4,$f5,$f5,$f5,$f5,$f5,$f6
            dc.b $f6,$f6,$f6,$f7,$f7,$f7,$f7,$f7
            dc.b $f8,$f8,$f8,$f8,$f9,$f9,$f9,$f9
            dc.b $f9,$fa,$fa,$fa,$fa,$fa,$fb,$fb
            dc.b $fb,$fb,$fb,$fc,$fc,$fc,$fc,$fc
            dc.b $fd,$fd,$fd,$fd,$fd,$fd,$fe,$fe
            dc.b $fe,$fe,$fe,$ff,$ff,$ff,$ff,$ff
    
    ; ===========================================================================
    
    loc_2CD0:
            bcc.s    loc_2CDC
            neg.w    d1                            ; negate the value of d1
            bset.l    #0,d2                        ; sets the 'y is greater than x' flag
      
    loc_2CDC:
            moveq    #0,d0
            move.b    Atan_Tab(pc,d1.w),d0        ; get the angle value
            moveq    #0,d1
            move.b    Octant_Adjust(pc,d2.w),d1    ; get the octant adjustment value
            eor.b    d1,d0                        ; place the angle in the appropriate octant
            movem.l    (sp)+,d3-d4
            rts
    ; End of function CalcAngle

    Now it should assemble without error and work flawlessly!

    Step 5 - Conclusion
    That's about it! Now you have a highly efficient CalcAngle routine. This routine has cycle count maxing out at 238 cycles, making it twice as fast as the old routine in most situations. For reference, this is what the final routine code looks like (with added comments and a bit of a snarky header):

    Code:
    ; ===========================================================================
    
    ; This was moved up here in order to keep the short branch size intact
    CalcAngle_Zero:
            moveq    #$40,d0                        ; this used to be move.w, but moveq is faster
            movem.l    (sp)+,d3-d4
            rts
    ; ===========================================================================
    
    ; ---------------------------------------------------------------------------
    ; Subroutine calculate an angle
    
    ; input:
    ;    d1 = x-axis distance
    ;    d2 = y-axis distance
    
    ; output:
    ;    d0 = angle
    
    ; Routine optimized by Nat the Porcupine
    
    ; This new method takes a max of 238 cycles to calculate an angle from...
    ; ...x/y-axis velocity values (excluding the cycles associated with jsr & rts)
    ; For comparison, the Sonic 1 method may take up to 402 cycles to calculate an angle!
    ; And before you ask, no, the method found in S3&K is still terrible at up to 382 cycles.
    
    ; TL;DR: This method, on average, executes about twice as fast as the original.
    
    ; This was inspired by an arctangent2 routine made by Johan Forslöf (doynax)...
    ; ...for the MOS 6502 CPU. His original code can be found at the link below
    ; Link: http://codebase64.org/doku.php?id=base:8bit_atan2_8-bit_angle
    ; ---------------------------------------------------------------------------
    
    ; ||||||||||||||| S U B    R O U T    I N E |||||||||||||||||||||||||||||||||||||||
    
    ; This is still the beginning of the routine
    CalcAngle:
            movem.l    d3-d4,-(sp)
            moveq    #0,d3
            moveq    #0,d4
            move.w    d1,d3
            move.w    d2,d4
            or.w    d3,d4
            beq.s    CalcAngle_Zero                ; special case when both x and y are zero
            move.w    d2,d4
            moveq    #0,d2                        ; d2 will now be used for octant adjustment flags
      
            tst.w    d3
            bpl.s    loc_2CC2
            neg.w    d3                            ; absolute value of d3
            bset.l    #2,d2                        ; sets the 'x-distance is negative' flag
    
    loc_2CC2:
            tst.w    d4
            bpl.s    loc_2CCA
            neg.w    d4                            ; absolute value of d4
            bset.l    #1,d2                        ; sets the 'y-distance is negative' flag
      
    loc_2CCA:
            moveq    #0,d1
            move.b    Log2_Tab(pc,d3.w),d1        ; logarithmic division is used to...
            sub.b    Log2_Tab(pc,d4.w),d1        ; ...find the y/x ratio instead of divu
            bra.w    loc_2CD0                    ; branch past the table and continue on
    ; ===========================================================================
    
    ; Table of values calculated by this equation: log2(x)*32
    Log2_Tab:
            dc.b $00,$00,$20,$32,$40,$4a,$52,$59
            dc.b $60,$65,$6a,$6e,$72,$76,$79,$7d
            dc.b $80,$82,$85,$87,$8a,$8c,$8e,$90
            dc.b $92,$94,$96,$98,$99,$9b,$9d,$9e
            dc.b $a0,$a1,$a2,$a4,$a5,$a6,$a7,$a9
            dc.b $aa,$ab,$ac,$ad,$ae,$af,$b0,$b1
            dc.b $b2,$b3,$b4,$b5,$b6,$b7,$b8,$b9
            dc.b $b9,$ba,$bb,$bc,$bd,$bd,$be,$bf
            dc.b $c0,$c0,$c1,$c2,$c2,$c3,$c4,$c4
            dc.b $c5,$c6,$c6,$c7,$c7,$c8,$c9,$c9
            dc.b $ca,$ca,$cb,$cc,$cc,$cd,$cd,$ce
            dc.b $ce,$cf,$cf,$d0,$d0,$d1,$d1,$d2
            dc.b $d2,$d3,$d3,$d4,$d4,$d5,$d5,$d5
            dc.b $d6,$d6,$d7,$d7,$d8,$d8,$d9,$d9
            dc.b $d9,$da,$da,$db,$db,$db,$dc,$dc
            dc.b $dd,$dd,$dd,$de,$de,$de,$df,$df
            dc.b $df,$e0,$e0,$e1,$e1,$e1,$e2,$e2
            dc.b $e2,$e3,$e3,$e3,$e4,$e4,$e4,$e5
            dc.b $e5,$e5,$e6,$e6,$e6,$e7,$e7,$e7
            dc.b $e7,$e8,$e8,$e8,$e9,$e9,$e9,$ea
            dc.b $ea,$ea,$ea,$eb,$eb,$eb,$ec,$ec
            dc.b $ec,$ec,$ed,$ed,$ed,$ed,$ee,$ee
            dc.b $ee,$ee,$ef,$ef,$ef,$ef,$f0,$f0
            dc.b $f0,$f1,$f1,$f1,$f1,$f1,$f2,$f2
            dc.b $f2,$f2,$f3,$f3,$f3,$f3,$f4,$f4
            dc.b $f4,$f4,$f5,$f5,$f5,$f5,$f5,$f6
            dc.b $f6,$f6,$f6,$f7,$f7,$f7,$f7,$f7
            dc.b $f8,$f8,$f8,$f8,$f9,$f9,$f9,$f9
            dc.b $f9,$fa,$fa,$fa,$fa,$fa,$fb,$fb
            dc.b $fb,$fb,$fb,$fc,$fc,$fc,$fc,$fc
            dc.b $fd,$fd,$fd,$fd,$fd,$fd,$fe,$fe
            dc.b $fe,$fe,$fe,$ff,$ff,$ff,$ff,$ff
    
    ; ===========================================================================
    
    loc_2CD0:
            bcc.s    loc_2CDC
            neg.w    d1                            ; negate the value of d1
            bset.l    #0,d2                        ; sets the 'y is greater than x' flag
      
    loc_2CDC:
            moveq    #0,d0
            move.b    Atan_Tab(pc,d1.w),d0        ; get the angle value
            moveq    #0,d1
            move.b    Octant_Adjust(pc,d2.w),d1    ; get the octant adjustment value
            eor.b    d1,d0                        ; place the angle in the appropriate octant
            movem.l    (sp)+,d3-d4
            rts
    ; End of function CalcAngle
    
    ; ===========================================================================
    
    ; This table is used to place the angle in the appropriate octant (half-quadrant)
    Octant_Adjust:
            dc.b %00111111        ; x+,y+,|x|>|y|
            dc.b %00000000        ; x+,y+,|x|<|y|
            dc.b %11000000        ; x+,y-,|x|>|y|
            dc.b %11111111        ; x+,y-,|x|<|y|
            dc.b %01000000        ; x-,y+,|x|>|y|
            dc.b %01111111        ; x-,y+,|x|<|y|
            dc.b %10111111        ; x-,y-,|x|>|y|
            dc.b %10000000        ; x-,y-,|x|<|y|
    
    ; ===========================================================================
    
    ; Table of approximated values calculated by the following formula: atan(2^(x/32))*128/pi
    ; This table differs from the one that Sonic 1/2/3&K uses, and is not interchangeable
    Atan_Tab:
            dc.b $00,$00,$00,$00,$00,$00,$00,$00
            dc.b $00,$00,$00,$00,$00,$00,$00,$00
            dc.b $00,$00,$00,$00,$00,$00,$00,$00
            dc.b $00,$00,$00,$00,$00,$00,$00,$00
            dc.b $00,$00,$00,$00,$00,$00,$00,$00
            dc.b $00,$00,$00,$00,$00,$00,$00,$00
            dc.b $00,$00,$00,$00,$00,$00,$00,$00
            dc.b $00,$00,$00,$00,$00,$00,$00,$00
            dc.b $00,$00,$00,$00,$00,$00,$00,$00
            dc.b $00,$00,$00,$00,$00,$00,$00,$00
            dc.b $00,$00,$00,$00,$00,$01,$01,$01
            dc.b $01,$01,$01,$01,$01,$01,$01,$01
            dc.b $01,$01,$01,$01,$01,$01,$01,$01
            dc.b $01,$01,$01,$01,$01,$01,$01,$01
            dc.b $01,$01,$01,$01,$01,$02,$02,$02
            dc.b $02,$02,$02,$02,$02,$02,$02,$02
            dc.b $02,$02,$02,$02,$02,$02,$02,$02
            dc.b $03,$03,$03,$03,$03,$03,$03,$03
            dc.b $03,$03,$03,$03,$03,$04,$04,$04
            dc.b $04,$04,$04,$04,$04,$04,$04,$04
            dc.b $05,$05,$05,$05,$05,$05,$05,$05
            dc.b $06,$06,$06,$06,$06,$06,$06,$06
            dc.b $07,$07,$07,$07,$07,$07,$08,$08
            dc.b $08,$08,$08,$08,$09,$09,$09,$09
            dc.b $09,$0a,$0a,$0a,$0a,$0b,$0b,$0b
            dc.b $0b,$0c,$0c,$0c,$0c,$0d,$0d,$0d
            dc.b $0d,$0e,$0e,$0e,$0e,$0f,$0f,$0f
            dc.b $10,$10,$10,$11,$11,$11,$12,$12
            dc.b $12,$13,$13,$13,$14,$14,$15,$15
            dc.b $15,$16,$16,$17,$17,$17,$18,$18
            dc.b $19,$19,$19,$1a,$1a,$1b,$1b,$1c
            dc.b $1c,$1c,$1d,$1d,$1e,$1e,$1f,$1f
    
    ; ===========================================================================

    As a bonus, you can also change the main code block of your CalcSine routine to this to save 4 extra cycles; not much, but it's something quick and simple, so why not?:


    Code:
    CalcSine:
        add.w    d0,d0                    ; multiply by 2
        andi.w    #$1FE,d0                ; PCAFR
        move.w    d0,d1
        move.w    Sine_Data(pc,d0.w),d0    ; sin
        addi.w    #$80,d1
        move.w    Sine_Data(pc,d1.w),d1    ; cos
        rts
    ; End of function CalcSine

    Well, I hope you found this useful. As always, feedback is appreciated. If you encounter any bugs, please let me know. Enjoy!
     
    Last edited: Aug 2, 2017
    Ashuro, Devon, TheStoneBanana and 3 others like this.
  2. Devon

    Devon I'm a loser, baby, so why don't you kill me? Member

    Joined:
    Aug 26, 2013
    Messages:
    1,376
    Location:
    your mom
    d3 and d4 only have their lowest byte negated when they are negative, yet they are later on referenced as a word for the table indices, so the upper byte will still remain untouched and can allow for undesirable results.
     
    TheStoneBanana likes this.
  3. TheStoneBanana

    TheStoneBanana banana Member

    Joined:
    Nov 27, 2013
    Messages:
    602
    Location:
    The Milky Way Galaxy
    EDIT: Nevermind, I was corrected elsewhere. Disregard this portion :p

    This is some excellent work here! It shows that a lot of time, effort, and research has gone into this, and it's great to see honestly. Good work!
     
  4. Devon

    Devon I'm a loser, baby, so why don't you kill me? Member

    Joined:
    Aug 26, 2013
    Messages:
    1,376
    Location:
    your mom
    Bit manipulation instructions only use the longword extension for registers.

    EDIT: TSB edited his post. That was targeted towards him.

    Anyways, one small speed boost you could do is remove:
    Code:
            moveq    #0,d3
            moveq    #0,d4
    Since, it's actually no longer necessary for that.
     
    Last edited: Aug 2, 2017
  5. LuigiXHero

    LuigiXHero Well-Known Member Member

    Joined:
    Mar 22, 2014
    Messages:
    280
    So I tried putting it into my hack and while I was running though the level I thought it was going smoothly. That is until I to roll down a quarter pipe which I do all the time, it completely stopped me in my tracks and sonic got stuck until I jumped. I can't show off my hack so I put together a test hack to show off this bug.



    Rom

    So yeah this isn't suppose to happen. I tried changing the neg.b back to neg.w after I found this bug and nothing changed so it wasn't that.

    It seems to only happen when facing away, and maybe rolling too? As shown in the video.
     
  6. AURORA☆FIELDS

    AURORA☆FIELDS so uh yes Exiled

    Joined:
    Oct 7, 2011
    Messages:
    759
    Devon, FохConED and Pineapple Arse like this.
  7. Devon

    Devon I'm a loser, baby, so why don't you kill me? Member

    Joined:
    Aug 26, 2013
    Messages:
    1,376
    Location:
    your mom
    Okay, the main issue I'm noticing is that in the Sonic engine, the X and Y distances used for CalcAngle can be greater than $FF, and thus be invalid indices for the log2 table.
     
    Last edited: Aug 2, 2017
  8. Nat The Porcupine

    Nat The Porcupine Point & Click Funny Man Member

    Joined:
    Jun 23, 2017
    Messages:
    71
    Location:
    Pennsyltucky
    Well this is a fine pickle. I'll try to fix this when I have the time. Maybe there's a better way to do this than tables of logarithmic division. Anyway, I added a warning banner at the top to prevent people from using this accidentally (in all honesty, the topic should probably be locked).

    Just out of curiosity, how far do you think I would have to expand the table Novedicus? Depending on the answer, it may just be better to make a quick division routine instead.
     
  9. AURORA☆FIELDS

    AURORA☆FIELDS so uh yes Exiled

    Joined:
    Oct 7, 2011
    Messages:
    759
    If you are going with a division routine you might as well just use divu/s because that will be faster at that point anyway.
     
  10. Devon

    Devon I'm a loser, baby, so why don't you kill me? Member

    Joined:
    Aug 26, 2013
    Messages:
    1,376
    Location:
    your mom
    To answer your question, the log2 table would need $8000 word values (since it only handles positive values). You would also need to extend the arctan table to like $200 entries, since there will be some new values in the log2 table beyond $FF.
     
    Last edited: Aug 2, 2017
    ProjectFM likes this.
  11. Nat The Porcupine

    Nat The Porcupine Point & Click Funny Man Member

    Joined:
    Jun 23, 2017
    Messages:
    71
    Location:
    Pennsyltucky
    Jesus Christ, that's over 65Kb of data! I think that I'll just reinstate the divu instructions; it's not worth wasting that much ROM space for a savings of 140 cycles. That being said, the half quadrant adjustments help avoid some branching, so that'll stay. In fact, it'll be the ONLY thing from the "Fresh Code" section that stays!

    I've certainly made a fool of myself with this, that's for sure.
     
  12. Devon

    Devon I'm a loser, baby, so why don't you kill me? Member

    Joined:
    Aug 26, 2013
    Messages:
    1,376
    Location:
    your mom
    Yeah, you should be cautious when porting code from a processor that has different bus widths than the processor you're porting to, and be aware of how it may be handled by other code.