Overhauling the CalcAngle routine

Naoto · Aug 2, 2017

Warning: This code actually has a pretty serious flaw that I didn't initially encounter in my testing. Until I can find a way to fix it, I'm going to have to recommend that you NOT follow this tutorial. I apologize for the inconvenience...

Before we begin, I need to give a huge shout out to Johan Forslöf for his brilliant 6502 8-bit atan2 routine. It was a huge help and I can safely say that I would not be able to make this tutorial without it. Now that we've got that out of the way, lets get on with the tutorial!

Preamble
So, the way Sonic 1 handles angle calculations is incredibly CPU hungry, taking up to 402 cycles to calculate just one! And before any of you ask, no, the method used by Sonic 3&K is still terrible, clocking in (no pun intended) at a maximum of 382 cycles. This may not seem like much of a problem at first, but there are quite a few situations where the pure gluttony of this routine can really start to take its toll, and frankly, I'm getting sick and tired of it! So let's fix it...

Step 1: Identifying the problem
Below is the original CalcAngle routine from Sonic 1. See if you can name one of the problems here:

Code:

; ---------------------------------------------------------------------------
; Subroutine calculate an angle

; input:
;    d1 = x-axis distance
;    d2 = y-axis distance

; output:
;    d0 = angle
; ---------------------------------------------------------------------------

; ||||||||||||||| S U B    R O U T    I N E |||||||||||||||||||||||||||||||||||||||


CalcAngle:
        movem.l    d3-d4,-(sp)
        moveq    #0,d3
        moveq    #0,d4
        move.w    d1,d3
        move.w    d2,d4
        or.w    d3,d4
        beq.s    CalcAngle_Zero
        move.w    d2,d4
        tst.w    d3
        bpl.w    loc_2CC2
        neg.w    d3

loc_2CC2:
        tst.w    d4
        bpl.w    loc_2CCA
        neg.w    d4

loc_2CCA:
        cmp.w    d3,d4
        bcc.w    loc_2CDC
        lsl.l    #8,d4
        divu.w    d3,d4
        moveq    #0,d0
        move.b    Angle_Data(pc,d4.w),d0
        bra.s    loc_2CE6
; ===========================================================================

loc_2CDC:
        lsl.l    #8,d3
        divu.w    d4,d3
        moveq    #$40,d0
        sub.b    Angle_Data(pc,d3.w),d0

loc_2CE6:
        tst.w    d1
        bpl.w    loc_2CF2
        neg.w    d0
        addi.w    #$80,d0

loc_2CF2:
        tst.w    d2
        bpl.w    loc_2CFE
        neg.w    d0
        addi.w    #$100,d0

loc_2CFE:
        movem.l    (sp)+,d3-d4
        rts
; ===========================================================================

CalcAngle_Zero:
        move.w    #$40,d0
        movem.l    (sp)+,d3-d4
        rts
; End of function CalcAngle

; ===========================================================================

Angle_Data:    incbin    "misc\angles.bin"

; ===========================================================================

Let's start with the more obvious problem; inappropriate branch sizes. For whatever reason, there are word-sized branch sizes in this routine when they should really be shorts. At first I thought the original programmer was just playing it safe, but then I noticed that the largest branch in the entire routine was a short, so in reality, they probably just mixed the two sizes up.

So, why exactly is this an issue? You see, a lot of 68k reference manuals will tell you that a branch will always take 10 cycles regardless of the size. However this is only true if a branch is taken, and because we have conditional branching, that branch won't always be taken. If a word branch isn't taken, it takes 12 cycles for the program to move on, but if the branch skipped is a short, it will only take 8 cycles to get back on track. This exactly what S2, SCD, and S3&K did to speed up the routine a little, but as I said earlier, that's still not good enough.

In short (more puns?), make sure you change all instances of "bpl.w" and "bcc.w" to "bpl.s" and "bcc.s"; it's a good start if nothing else.

Step 2: Identifying the actual problem
The real issue with the original code is its reliance on the "divu.w" instruction. For the unenlightened, divu is a built in 68k instruction that performs unsigned division. This instruction is PAINFULLY slow, taking about 140 cycles to finish on average. In addition, there's effectively a 10% difference in execution time thanks to the algorithm being used, so it may actually take up to 154 cycles if you feed it numbers that aren't favored by the algorithm. But we need to use division in order to get the y/x ratio, so what do we do? The answer is to use a lookup table of logarithms to get the y/x ratio instead. So let's get started...

Step 3: Setting things up
So first thing's first, your going to want to delete all the code from just below label loc_2CCA up to and including label loc_2CFE. Go ahead and delete the Angle_Data section too. Your code should now look like this:

Code:

; ---------------------------------------------------------------------------
; Subroutine calculate an angle

; input:
;    d1 = x-axis distance
;    d2 = y-axis distance

; output:
;    d0 = angle
; ---------------------------------------------------------------------------

; ||||||||||||||| S U B    R O U T    I N E |||||||||||||||||||||||||||||||||||||||


CalcAngle:
        movem.l    d3-d4,-(sp)
        moveq    #0,d3
        moveq    #0,d4
        move.w    d1,d3
        move.w    d2,d4
        or.w    d3,d4
        beq.s    CalcAngle_Zero            ; special case when both x and y are zero
        move.w    d2,d4
        tst.w    d3
        bpl.s    loc_2CC2
        neg.w    d3

loc_2CC2:
        tst.w    d4
        bpl.s    loc_2CCA
        neg.w    d4

loc_2CCA:
  
        movem.l    (sp)+,d3-d4
        rts
; ===========================================================================

CalcAngle_Zero:
        move.w    #$40,d0
        movem.l    (sp)+,d3-d4
        rts
; End of function CalcAngle

; ===========================================================================

Next, go ahead and move the CalcAngle_Zero section to the very top of your code (I'll explain later). Go ahead and turn that "move.w" to "moveq"; it'll save some space and a few cycles. Your code should now look like this:

Code:

; ===========================================================================

CalcAngle_Zero:
        moveq    #$40,d0                        ; this used to be move.w, but moveq is faster
        movem.l    (sp)+,d3-d4
        rts
; ===========================================================================


; ---------------------------------------------------------------------------
; Subroutine calculate an angle

; input:
;    d1 = x-axis distance
;    d2 = y-axis distance

; output:
;    d0 = angle
; ---------------------------------------------------------------------------

; ||||||||||||||| S U B    R O U T    I N E |||||||||||||||||||||||||||||||||||||||


CalcAngle:
        movem.l    d3-d4,-(sp)
        moveq    #0,d3
        moveq    #0,d4
        move.w    d1,d3
        move.w    d2,d4
        or.w    d3,d4
        beq.s    CalcAngle_Zero            ; special case when both x and y are zero
        move.w    d2,d4
        tst.w    d3
        bpl.s    loc_2CC2
        neg.w    d3

loc_2CC2:
        tst.w    d4
        bpl.s    loc_2CCA
        neg.w    d4

loc_2CCA:
 
        movem.l    (sp)+,d3-d4
        rts
; End of function CalcAngle

; ===========================================================================

Step 4: Fresh Code
Alright, are you ready to add some new code? Good! Lets start by making the following additions:

Code:

        ...
        move.w    d2,d4
        moveq    #0,d2                      ; ADD THIS
  
        tst.w    d3
        bpl.s    loc_2CC2
        neg.w    d3
        bset.l    #2,d2                        ; ADD THIS

loc_2CC2:
        tst.w    d4
        bpl.s    loc_2CCA
        neg.w    d4
        bset.l    #1,d2                        ; ADD THIS

The d2 register is used here as a bitfield of flags that will be used to adjust the final angle value to the appropriate half-quadrant, or more accurately, the correct "octant". Next, we should add the lookup tables for logarithms, angles, and octant adjustments. Copy and paste the following to the bottom of the routine:

Code:

; ===========================================================================

; This table is used to place the angle in the appropriate octant (half-quadrant)
Octant_Adjust:
        dc.b %00111111        ; x+,y+,|x|>|y|
        dc.b %00000000        ; x+,y+,|x|<|y|
        dc.b %11000000        ; x+,y-,|x|>|y|
        dc.b %11111111        ; x+,y-,|x|<|y|
        dc.b %01000000        ; x-,y+,|x|>|y|
        dc.b %01111111        ; x-,y+,|x|<|y|
        dc.b %10111111        ; x-,y-,|x|>|y|
        dc.b %10000000        ; x-,y-,|x|<|y|

; ===========================================================================

; Table of approximated values calculated by the following formula: atan(2^(x/32))*128/pi
Atan_Tab:
        dc.b $00,$00,$00,$00,$00,$00,$00,$00
        dc.b $00,$00,$00,$00,$00,$00,$00,$00
        dc.b $00,$00,$00,$00,$00,$00,$00,$00
        dc.b $00,$00,$00,$00,$00,$00,$00,$00
        dc.b $00,$00,$00,$00,$00,$00,$00,$00
        dc.b $00,$00,$00,$00,$00,$00,$00,$00
        dc.b $00,$00,$00,$00,$00,$00,$00,$00
        dc.b $00,$00,$00,$00,$00,$00,$00,$00
        dc.b $00,$00,$00,$00,$00,$00,$00,$00
        dc.b $00,$00,$00,$00,$00,$00,$00,$00
        dc.b $00,$00,$00,$00,$00,$01,$01,$01
        dc.b $01,$01,$01,$01,$01,$01,$01,$01
        dc.b $01,$01,$01,$01,$01,$01,$01,$01
        dc.b $01,$01,$01,$01,$01,$01,$01,$01
        dc.b $01,$01,$01,$01,$01,$02,$02,$02
        dc.b $02,$02,$02,$02,$02,$02,$02,$02
        dc.b $02,$02,$02,$02,$02,$02,$02,$02
        dc.b $03,$03,$03,$03,$03,$03,$03,$03
        dc.b $03,$03,$03,$03,$03,$04,$04,$04
        dc.b $04,$04,$04,$04,$04,$04,$04,$04
        dc.b $05,$05,$05,$05,$05,$05,$05,$05
        dc.b $06,$06,$06,$06,$06,$06,$06,$06
        dc.b $07,$07,$07,$07,$07,$07,$08,$08
        dc.b $08,$08,$08,$08,$09,$09,$09,$09
        dc.b $09,$0a,$0a,$0a,$0a,$0b,$0b,$0b
        dc.b $0b,$0c,$0c,$0c,$0c,$0d,$0d,$0d
        dc.b $0d,$0e,$0e,$0e,$0e,$0f,$0f,$0f
        dc.b $10,$10,$10,$11,$11,$11,$12,$12
        dc.b $12,$13,$13,$13,$14,$14,$15,$15
        dc.b $15,$16,$16,$17,$17,$17,$18,$18
        dc.b $19,$19,$19,$1a,$1a,$1b,$1b,$1c
        dc.b $1c,$1c,$1d,$1d,$1e,$1e,$1f,$1f

; ===========================================================================

; Table of values calculated by this equation: log2(x)*32
Log2_Tab:
        dc.b $00,$00,$20,$32,$40,$4a,$52,$59
        dc.b $60,$65,$6a,$6e,$72,$76,$79,$7d
        dc.b $80,$82,$85,$87,$8a,$8c,$8e,$90
        dc.b $92,$94,$96,$98,$99,$9b,$9d,$9e
        dc.b $a0,$a1,$a2,$a4,$a5,$a6,$a7,$a9
        dc.b $aa,$ab,$ac,$ad,$ae,$af,$b0,$b1
        dc.b $b2,$b3,$b4,$b5,$b6,$b7,$b8,$b9
        dc.b $b9,$ba,$bb,$bc,$bd,$bd,$be,$bf
        dc.b $c0,$c0,$c1,$c2,$c2,$c3,$c4,$c4
        dc.b $c5,$c6,$c6,$c7,$c7,$c8,$c9,$c9
        dc.b $ca,$ca,$cb,$cc,$cc,$cd,$cd,$ce
        dc.b $ce,$cf,$cf,$d0,$d0,$d1,$d1,$d2
        dc.b $d2,$d3,$d3,$d4,$d4,$d5,$d5,$d5
        dc.b $d6,$d6,$d7,$d7,$d8,$d8,$d9,$d9
        dc.b $d9,$da,$da,$db,$db,$db,$dc,$dc
        dc.b $dd,$dd,$dd,$de,$de,$de,$df,$df
        dc.b $df,$e0,$e0,$e1,$e1,$e1,$e2,$e2
        dc.b $e2,$e3,$e3,$e3,$e4,$e4,$e4,$e5
        dc.b $e5,$e5,$e6,$e6,$e6,$e7,$e7,$e7
        dc.b $e7,$e8,$e8,$e8,$e9,$e9,$e9,$ea
        dc.b $ea,$ea,$ea,$eb,$eb,$eb,$ec,$ec
        dc.b $ec,$ec,$ed,$ed,$ed,$ed,$ee,$ee
        dc.b $ee,$ee,$ef,$ef,$ef,$ef,$f0,$f0
        dc.b $f0,$f1,$f1,$f1,$f1,$f1,$f2,$f2
        dc.b $f2,$f2,$f3,$f3,$f3,$f3,$f4,$f4
        dc.b $f4,$f4,$f5,$f5,$f5,$f5,$f5,$f6
        dc.b $f6,$f6,$f6,$f7,$f7,$f7,$f7,$f7
        dc.b $f8,$f8,$f8,$f8,$f9,$f9,$f9,$f9
        dc.b $f9,$fa,$fa,$fa,$fa,$fa,$fb,$fb
        dc.b $fb,$fb,$fb,$fc,$fc,$fc,$fc,$fc
        dc.b $fd,$fd,$fd,$fd,$fd,$fd,$fe,$fe
        dc.b $fe,$fe,$fe,$ff,$ff,$ff,$ff,$ff

; ===========================================================================

Lastly, go ahead and change all of the code below loc_2CCA to read like this:

Code:

loc_2CCA:
        moveq    #0,d1
        move.b    Log2_Tab(pc,d3.w),d1        ; logarithmic division is used to...
        sub.b    Log2_Tab(pc,d4.w),d1        ; ...find the y/x ratio instead of divu
        bcc.s    loc_2CDC
        neg.w    d1                            ; negate the value of d1
        bset.l    #0,d2                        ; sets the 'y is greater than x' flag
  
loc_2CDC:
        moveq    #0,d0
        move.b    Atan_Tab(pc,d1.w),d0        ; get the angle value
        moveq    #0,d1
        move.b    Octant_Adjust(pc,d2.w),d1    ; get the octant adjustment value
        eor.b    d1,d0                        ; place the angle in the appropriate octant
        movem.l    (sp)+,d3-d4
        rts

And that should be it! Just save, assemble, and... uh oh...

Code:

> > >sonic.asm: error: addressing mode not allowed on 68000
> > > move.b    Log2_Tab(pc,d3.w),d1

> > >sonic.asm: error: addressing mode not allowed here
> > > move.b    Log2_Tab(pc,d3.w),d1

> > >sonic.asm: error: addressing mode not allowed on 68000
> > > sub.b    Log2_Tab(pc,d4.w),d1

> > >sonic.asm: error: addressing mode not allowed here
> > > sub.b    Log2_Tab(pc,d4.w),d1

Ok, I lied, we're not actually done yet. I had you do this intentionally in order to teach you about the limitations of accessing lookup tables in this way. If your going to access lookup table like this, the starting address of a lookup table must be within the range of 127 bytes ahead or 128 bytes back from that point in the ROM. Basically, if you can't "bra.s" to the lookup table from the same point in ROM, then "move.b ROM_Address(pc,dn.w),dn" isn't going to work either. To counter this, we move the Log table closer to the lookup commands and branch past the table once we're done. Remember when I told you to move the CalcAngle_Zero section to the top of the routine? This is why; we wouldn't be able to preserve the short branch size with the Log table in the way, so that block was relocated to keep our routine nice and speedy. Anyway, after making that change, it'll look something like this:

Code:

loc_2CCA:
        moveq    #0,d1
        move.b    Log2_Tab(pc,d3.w),d1        ; logarithmic division is used to...
        sub.b    Log2_Tab(pc,d4.w),d1        ; ...find the y/x ratio instead of divu
        bra.w    loc_2CD0                    ; branch past the table and continue on
; ===========================================================================

; Table of values calculated by this equation: log2(x)*32
Log2_Tab:
        dc.b $00,$00,$20,$32,$40,$4a,$52,$59
        dc.b $60,$65,$6a,$6e,$72,$76,$79,$7d
        dc.b $80,$82,$85,$87,$8a,$8c,$8e,$90
        dc.b $92,$94,$96,$98,$99,$9b,$9d,$9e
        dc.b $a0,$a1,$a2,$a4,$a5,$a6,$a7,$a9
        dc.b $aa,$ab,$ac,$ad,$ae,$af,$b0,$b1
        dc.b $b2,$b3,$b4,$b5,$b6,$b7,$b8,$b9
        dc.b $b9,$ba,$bb,$bc,$bd,$bd,$be,$bf
        dc.b $c0,$c0,$c1,$c2,$c2,$c3,$c4,$c4
        dc.b $c5,$c6,$c6,$c7,$c7,$c8,$c9,$c9
        dc.b $ca,$ca,$cb,$cc,$cc,$cd,$cd,$ce
        dc.b $ce,$cf,$cf,$d0,$d0,$d1,$d1,$d2
        dc.b $d2,$d3,$d3,$d4,$d4,$d5,$d5,$d5
        dc.b $d6,$d6,$d7,$d7,$d8,$d8,$d9,$d9
        dc.b $d9,$da,$da,$db,$db,$db,$dc,$dc
        dc.b $dd,$dd,$dd,$de,$de,$de,$df,$df
        dc.b $df,$e0,$e0,$e1,$e1,$e1,$e2,$e2
        dc.b $e2,$e3,$e3,$e3,$e4,$e4,$e4,$e5
        dc.b $e5,$e5,$e6,$e6,$e6,$e7,$e7,$e7
        dc.b $e7,$e8,$e8,$e8,$e9,$e9,$e9,$ea
        dc.b $ea,$ea,$ea,$eb,$eb,$eb,$ec,$ec
        dc.b $ec,$ec,$ed,$ed,$ed,$ed,$ee,$ee
        dc.b $ee,$ee,$ef,$ef,$ef,$ef,$f0,$f0
        dc.b $f0,$f1,$f1,$f1,$f1,$f1,$f2,$f2
        dc.b $f2,$f2,$f3,$f3,$f3,$f3,$f4,$f4
        dc.b $f4,$f4,$f5,$f5,$f5,$f5,$f5,$f6
        dc.b $f6,$f6,$f6,$f7,$f7,$f7,$f7,$f7
        dc.b $f8,$f8,$f8,$f8,$f9,$f9,$f9,$f9
        dc.b $f9,$fa,$fa,$fa,$fa,$fa,$fb,$fb
        dc.b $fb,$fb,$fb,$fc,$fc,$fc,$fc,$fc
        dc.b $fd,$fd,$fd,$fd,$fd,$fd,$fe,$fe
        dc.b $fe,$fe,$fe,$ff,$ff,$ff,$ff,$ff

; ===========================================================================

loc_2CD0:
        bcc.s    loc_2CDC
        neg.w    d1                            ; negate the value of d1
        bset.l    #0,d2                        ; sets the 'y is greater than x' flag
  
loc_2CDC:
        moveq    #0,d0
        move.b    Atan_Tab(pc,d1.w),d0        ; get the angle value
        moveq    #0,d1
        move.b    Octant_Adjust(pc,d2.w),d1    ; get the octant adjustment value
        eor.b    d1,d0                        ; place the angle in the appropriate octant
        movem.l    (sp)+,d3-d4
        rts
; End of function CalcAngle

Now it should assemble without error and work flawlessly!

Step 5 - Conclusion
That's about it! Now you have a highly efficient CalcAngle routine. This routine has cycle count maxing out at 238 cycles, making it twice as fast as the old routine in most situations. For reference, this is what the final routine code looks like (with added comments and a bit of a snarky header):

Code:

; ===========================================================================

; This was moved up here in order to keep the short branch size intact
CalcAngle_Zero:
        moveq    #$40,d0                        ; this used to be move.w, but moveq is faster
        movem.l    (sp)+,d3-d4
        rts
; ===========================================================================

; ---------------------------------------------------------------------------
; Subroutine calculate an angle

; input:
;    d1 = x-axis distance
;    d2 = y-axis distance

; output:
;    d0 = angle

; Routine optimized by Nat the Porcupine

; This new method takes a max of 238 cycles to calculate an angle from...
; ...x/y-axis velocity values (excluding the cycles associated with jsr & rts)
; For comparison, the Sonic 1 method may take up to 402 cycles to calculate an angle!
; And before you ask, no, the method found in S3&K is still terrible at up to 382 cycles.

; TL;DR: This method, on average, executes about twice as fast as the original.

; This was inspired by an arctangent2 routine made by Johan Forslöf (doynax)...
; ...for the MOS 6502 CPU. His original code can be found at the link below
; Link: http://codebase64.org/doku.php?id=base:8bit_atan2_8-bit_angle
; ---------------------------------------------------------------------------

; ||||||||||||||| S U B    R O U T    I N E |||||||||||||||||||||||||||||||||||||||

; This is still the beginning of the routine
CalcAngle:
        movem.l    d3-d4,-(sp)
        moveq    #0,d3
        moveq    #0,d4
        move.w    d1,d3
        move.w    d2,d4
        or.w    d3,d4
        beq.s    CalcAngle_Zero                ; special case when both x and y are zero
        move.w    d2,d4
        moveq    #0,d2                        ; d2 will now be used for octant adjustment flags
  
        tst.w    d3
        bpl.s    loc_2CC2
        neg.w    d3                            ; absolute value of d3
        bset.l    #2,d2                        ; sets the 'x-distance is negative' flag

loc_2CC2:
        tst.w    d4
        bpl.s    loc_2CCA
        neg.w    d4                            ; absolute value of d4
        bset.l    #1,d2                        ; sets the 'y-distance is negative' flag
  
loc_2CCA:
        moveq    #0,d1
        move.b    Log2_Tab(pc,d3.w),d1        ; logarithmic division is used to...
        sub.b    Log2_Tab(pc,d4.w),d1        ; ...find the y/x ratio instead of divu
        bra.w    loc_2CD0                    ; branch past the table and continue on
; ===========================================================================

; Table of values calculated by this equation: log2(x)*32
Log2_Tab:
        dc.b $00,$00,$20,$32,$40,$4a,$52,$59
        dc.b $60,$65,$6a,$6e,$72,$76,$79,$7d
        dc.b $80,$82,$85,$87,$8a,$8c,$8e,$90
        dc.b $92,$94,$96,$98,$99,$9b,$9d,$9e
        dc.b $a0,$a1,$a2,$a4,$a5,$a6,$a7,$a9
        dc.b $aa,$ab,$ac,$ad,$ae,$af,$b0,$b1
        dc.b $b2,$b3,$b4,$b5,$b6,$b7,$b8,$b9
        dc.b $b9,$ba,$bb,$bc,$bd,$bd,$be,$bf
        dc.b $c0,$c0,$c1,$c2,$c2,$c3,$c4,$c4
        dc.b $c5,$c6,$c6,$c7,$c7,$c8,$c9,$c9
        dc.b $ca,$ca,$cb,$cc,$cc,$cd,$cd,$ce
        dc.b $ce,$cf,$cf,$d0,$d0,$d1,$d1,$d2
        dc.b $d2,$d3,$d3,$d4,$d4,$d5,$d5,$d5
        dc.b $d6,$d6,$d7,$d7,$d8,$d8,$d9,$d9
        dc.b $d9,$da,$da,$db,$db,$db,$dc,$dc
        dc.b $dd,$dd,$dd,$de,$de,$de,$df,$df
        dc.b $df,$e0,$e0,$e1,$e1,$e1,$e2,$e2
        dc.b $e2,$e3,$e3,$e3,$e4,$e4,$e4,$e5
        dc.b $e5,$e5,$e6,$e6,$e6,$e7,$e7,$e7
        dc.b $e7,$e8,$e8,$e8,$e9,$e9,$e9,$ea
        dc.b $ea,$ea,$ea,$eb,$eb,$eb,$ec,$ec
        dc.b $ec,$ec,$ed,$ed,$ed,$ed,$ee,$ee
        dc.b $ee,$ee,$ef,$ef,$ef,$ef,$f0,$f0
        dc.b $f0,$f1,$f1,$f1,$f1,$f1,$f2,$f2
        dc.b $f2,$f2,$f3,$f3,$f3,$f3,$f4,$f4
        dc.b $f4,$f4,$f5,$f5,$f5,$f5,$f5,$f6
        dc.b $f6,$f6,$f6,$f7,$f7,$f7,$f7,$f7
        dc.b $f8,$f8,$f8,$f8,$f9,$f9,$f9,$f9
        dc.b $f9,$fa,$fa,$fa,$fa,$fa,$fb,$fb
        dc.b $fb,$fb,$fb,$fc,$fc,$fc,$fc,$fc
        dc.b $fd,$fd,$fd,$fd,$fd,$fd,$fe,$fe
        dc.b $fe,$fe,$fe,$ff,$ff,$ff,$ff,$ff

; ===========================================================================

loc_2CD0:
        bcc.s    loc_2CDC
        neg.w    d1                            ; negate the value of d1
        bset.l    #0,d2                        ; sets the 'y is greater than x' flag
  
loc_2CDC:
        moveq    #0,d0
        move.b    Atan_Tab(pc,d1.w),d0        ; get the angle value
        moveq    #0,d1
        move.b    Octant_Adjust(pc,d2.w),d1    ; get the octant adjustment value
        eor.b    d1,d0                        ; place the angle in the appropriate octant
        movem.l    (sp)+,d3-d4
        rts
; End of function CalcAngle

; ===========================================================================

; This table is used to place the angle in the appropriate octant (half-quadrant)
Octant_Adjust:
        dc.b %00111111        ; x+,y+,|x|>|y|
        dc.b %00000000        ; x+,y+,|x|<|y|
        dc.b %11000000        ; x+,y-,|x|>|y|
        dc.b %11111111        ; x+,y-,|x|<|y|
        dc.b %01000000        ; x-,y+,|x|>|y|
        dc.b %01111111        ; x-,y+,|x|<|y|
        dc.b %10111111        ; x-,y-,|x|>|y|
        dc.b %10000000        ; x-,y-,|x|<|y|

; ===========================================================================

; Table of approximated values calculated by the following formula: atan(2^(x/32))*128/pi
; This table differs from the one that Sonic 1/2/3&K uses, and is not interchangeable
Atan_Tab:
        dc.b $00,$00,$00,$00,$00,$00,$00,$00
        dc.b $00,$00,$00,$00,$00,$00,$00,$00
        dc.b $00,$00,$00,$00,$00,$00,$00,$00
        dc.b $00,$00,$00,$00,$00,$00,$00,$00
        dc.b $00,$00,$00,$00,$00,$00,$00,$00
        dc.b $00,$00,$00,$00,$00,$00,$00,$00
        dc.b $00,$00,$00,$00,$00,$00,$00,$00
        dc.b $00,$00,$00,$00,$00,$00,$00,$00
        dc.b $00,$00,$00,$00,$00,$00,$00,$00
        dc.b $00,$00,$00,$00,$00,$00,$00,$00
        dc.b $00,$00,$00,$00,$00,$01,$01,$01
        dc.b $01,$01,$01,$01,$01,$01,$01,$01
        dc.b $01,$01,$01,$01,$01,$01,$01,$01
        dc.b $01,$01,$01,$01,$01,$01,$01,$01
        dc.b $01,$01,$01,$01,$01,$02,$02,$02
        dc.b $02,$02,$02,$02,$02,$02,$02,$02
        dc.b $02,$02,$02,$02,$02,$02,$02,$02
        dc.b $03,$03,$03,$03,$03,$03,$03,$03
        dc.b $03,$03,$03,$03,$03,$04,$04,$04
        dc.b $04,$04,$04,$04,$04,$04,$04,$04
        dc.b $05,$05,$05,$05,$05,$05,$05,$05
        dc.b $06,$06,$06,$06,$06,$06,$06,$06
        dc.b $07,$07,$07,$07,$07,$07,$08,$08
        dc.b $08,$08,$08,$08,$09,$09,$09,$09
        dc.b $09,$0a,$0a,$0a,$0a,$0b,$0b,$0b
        dc.b $0b,$0c,$0c,$0c,$0c,$0d,$0d,$0d
        dc.b $0d,$0e,$0e,$0e,$0e,$0f,$0f,$0f
        dc.b $10,$10,$10,$11,$11,$11,$12,$12
        dc.b $12,$13,$13,$13,$14,$14,$15,$15
        dc.b $15,$16,$16,$17,$17,$17,$18,$18
        dc.b $19,$19,$19,$1a,$1a,$1b,$1b,$1c
        dc.b $1c,$1c,$1d,$1d,$1e,$1e,$1f,$1f

; ===========================================================================

As a bonus, you can also change the main code block of your CalcSine routine to this to save 4 extra cycles; not much, but it's something quick and simple, so why not?:

Code:

CalcSine:
    add.w    d0,d0                    ; multiply by 2
    andi.w    #$1FE,d0                ; PCAFR
    move.w    d0,d1
    move.w    Sine_Data(pc,d0.w),d0    ; sin
    addi.w    #$80,d1
    move.w    Sine_Data(pc,d1.w),d1    ; cos
    rts
; End of function CalcSine

Well, I hope you found this useful. As always, feedback is appreciated. If you encounter any bugs, please let me know. Enjoy!

Devon · Aug 2, 2017

d3 and d4 only have their lowest byte negated when they are negative, yet they are later on referenced as a word for the table indices, so the upper byte will still remain untouched and can allow for undesirable results.

TheStoneBanana · Aug 2, 2017

EDIT: Nevermind, I was corrected elsewhere. Disregard this portion

This is some excellent work here! It shows that a lot of time, effort, and research has gone into this, and it's great to see honestly. Good work!

Devon · Aug 2, 2017

Bit manipulation instructions only use the longword extension for registers.

EDIT: TSB edited his post. That was targeted towards him.

Anyways, one small speed boost you could do is remove:
Code:
        moveq    #0,d3
        moveq    #0,d4
Since, it's actually no longer necessary for that.

LuigiXHero · Aug 2, 2017

So I tried putting it into my hack and while I was running though the level I thought it was going smoothly. That is until I to roll down a quarter pipe which I do all the time, it completely stopped me in my tracks and sonic got stuck until I jumped. I can't show off my hack so I put together a test hack to show off this bug.

Rom

So yeah this isn't suppose to happen. I tried changing the neg.b back to neg.w after I found this bug and nothing changed so it wasn't that.

It seems to only happen when facing away, and maybe rolling too? As shown in the video.

AURORA☆FIELDS · Aug 2, 2017

I am not gonna comment on the atan optimization because apparently thats broken, so I am just going to instead give you my optimized super 68k magic version of CalcSine. It is still 8 cycles faster than yours, and total of 12(!) cycles faster the original routine.

Devon · Aug 2, 2017

Okay, the main issue I'm noticing is that in the Sonic engine, the X and Y distances used for CalcAngle can be greater than $FF, and thus be invalid indices for the log2 table.

Naoto · Aug 2, 2017

LuigiXHero said: ↑

So I tried putting it into my hack and while I was running though the level I thought it was going smoothly. That is until I to roll down a quarter pipe which I do all the time, it completely stopped me in my tracks and sonic got stuck until I jumped. I can't show off my hack so I put together a test hack to show off this bug.

Rom

So yeah this isn't suppose to happen. I tried changing the neg.b back to neg.w after I found this bug and nothing changed so it wasn't that.

It seems to only happen when facing away, and maybe rolling too? As shown in the video.
Click to expand...

Novedicus said: ↑

Okay, the main issue I'm noticing is that in the Sonic engine, the X and Y distances used for CalcAngle can be greater than $FF, and thus be invalid indices for the log2 table.
Click to expand...

Well this is a fine pickle. I'll try to fix this when I have the time. Maybe there's a better way to do this than tables of logarithmic division. Anyway, I added a warning banner at the top to prevent people from using this accidentally (in all honesty, the topic should probably be locked).

Just out of curiosity, how far do you think I would have to expand the table Novedicus? Depending on the answer, it may just be better to make a quick division routine instead.

AURORA☆FIELDS · Aug 2, 2017

If you are going with a division routine you might as well just use divu/s because that will be faster at that point anyway.

Devon · Aug 2, 2017

To answer your question, the log2 table would need $8000 word values (since it only handles positive values). You would also need to extend the arctan table to like $200 entries, since there will be some new values in the log2 table beyond $FF.

Naoto · Aug 3, 2017

Novedicus said: ↑

To answer your question, the log2 table would need $8000 word values (since it only handles positive values). You would also need to extend the arctan table to like $200 entries, since there will be some new values in the log2 table beyond $FF.
Click to expand...

Jesus Christ, that's over 65Kb of data! I think that I'll just reinstate the divu instructions; it's not worth wasting that much ROM space for a savings of 140 cycles. That being said, the half quadrant adjustments help avoid some branching, so that'll stay. In fact, it'll be the ONLY thing from the "Fresh Code" section that stays!

I've certainly made a fool of myself with this, that's for sure.

Devon · Aug 3, 2017

Yeah, you should be cautious when porting code from a processor that has different bus widths than the processor you're porting to, and be aware of how it may be handled by other code.

Log in or Sign up

Overhauling the CalcAngle routine

Naoto Shadow Admin Member

Devon Do not contact me. Member

TheStoneBanana banana Member

Devon Do not contact me. Member

LuigiXHero Well-Known Member Member

AURORA☆FIELDS so uh yes Exiled

Devon Do not contact me. Member

Naoto Shadow Admin Member

AURORA☆FIELDS so uh yes Exiled

Devon Do not contact me. Member

Naoto Shadow Admin Member

Devon Do not contact me. Member

Log in or Sign up

Overhauling the CalcAngle routine

Naoto Shadow Admin Member

Devon Do not contact me. Member

TheStoneBanana banana Member

Devon Do not contact me. Member

LuigiXHero Well-Known Member Member

AURORA☆FIELDS so uh yes Exiled

Devon Do not contact me. Member

Naoto Shadow Admin Member

AURORA☆FIELDS so uh yes Exiled

Devon Do not contact me. Member

Naoto Shadow Admin Member

Devon Do not contact me. Member

Useful Searches