Optimized CalcAngle (atan2 function)

Devon · Dec 7, 2023

Some years ago, there was an attempt to optimize this function by using logarithmic division instead. Unfortunately, there were some glaring issues with it that prevented it from working correctly, notably the fact that it failed to take into account the fact that the function that was used as the basis for that optimization only supported 8-bit input values, whereas Sonic 1/2/3K supported 16-bit input values.

Recently, I decided to take my own crack at it, and came up with this:

Code:

; -------------------------------------------------------------------------
; 2-argument arctangent (angle between (0,0) and (x,y))
; Based on http://codebase64.org/doku.php?id=base:8bit_atan2_8-bit_angle
; -------------------------------------------------------------------------
; PARAMETERS:
;       d1.w - X value
;       d2.w - Y value
; RETURNS:
;       d0.b - 2-argument arctangent value (angle between (0,0) and (x,y))
; -------------------------------------------------------------------------

CalcAngle:
        moveq   #0,d0                           ; Default to bottom right quadrant
        tst.w   d1                              ; Is the X value negative?
        beq.s   CalcAngle_XZero                 ; If the X value is zero, branch
        bpl.s   CalcAngle_CheckY                ; If not, branch
        neg.w   d1                              ; If so, get the absolute value
        moveq   #4,d0                           ; Shift to left quadrant
 
CalcAngle_CheckY:
        tst.w   d2                              ; Is the Y value negative?
        beq.s   CalcAngle_YZero                 ; If the Y value is zero, branch
        bpl.s   CalcAngle_CheckOctet            ; If not, branch
        neg.w   d2                              ; If so, get the absolute value
        addq.b  #2,d0                           ; Shift to top quadrant

CalcAngle_CheckOctet:
        cmp.w   d2,d1                           ; Are we horizontally closer to the center?
        bcc.s   CalcAngle_Divide                ; If not, branch
        exg.l   d1,d2                           ; If so, divide Y from X instead
        addq.b  #1,d0                           ; Use octant that's horizontally closer to the center
 
CalcAngle_Divide:
        move.w  d1,-(sp)                        ; Shrink X and Y down into bytes
        moveq   #0,d3
        move.b  (sp)+,d3
        move.b  WordShiftTable(pc,d3.w),d3
        lsr.w   d3,d1
        lsr.w   d3,d2

        lea     Log2Table(pc),a2                ; Perform logarithmic division
        move.b  (a2,d2.w),d2
        sub.b   (a2,d1.w),d2
        bne.s   CalcAngle_GetAtan2Val
        move.w  #$FF,d2                         ; Edge case where X and Y values are too close for the division to handle

CalcAngle_GetAtan2Val:
        lea     Atan2Table(pc),a2               ; Get atan2 value
        move.b  (a2,d2.w),d2
        move.b  OctantAdjust(pc,d0.w),d0
        eor.b   d2,d0
        rts

; -------------------------------------------------------------------------

CalcAngle_YZero:
        tst.b   d0                              ; Was the X value negated?
        beq.s   CalcAngle_End                   ; If not, branch (d0 is already 0, so no need to set it again on branch)
        moveq   #$FFFFFF80,d0                   ; 180 degrees

CalcAngle_End:
        rts

CalcAngle_XZero:
        tst.w   d2                              ; Is the Y value negative?
        bmi.s   CalcAngle_XZeroYNeg             ; If so, branch
        moveq   #$40,d0                         ; 90 degrees
        rts

CalcAngle_XZeroYNeg:
        moveq   #$FFFFFFC0,d0                   ; 270 degrees
        rts
 
; -------------------------------------------------------------------------

OctantAdjust:
        dc.b    %00000000                       ; +X, +Y, |X|>|Y|
        dc.b    %00111111                       ; +X, +Y, |X|<|Y|
        dc.b    %11111111                       ; +X, -Y, |X|>|Y|
        dc.b    %11000000                       ; +X, -Y, |X|<|Y|
        dc.b    %01111111                       ; -X, +Y, |X|>|Y|
        dc.b    %01000000                       ; -X, +Y, |X|<|Y|
        dc.b    %10000000                       ; -X, -Y, |X|>|Y|
        dc.b    %10111111                       ; -X, -Y, |X|<|Y|

WordShiftTable:
        dc.b    $00, $01, $02, $02, $03, $03, $03, $03
        dc.b    $04, $04, $04, $04, $04, $04, $04, $04
        dc.b    $05, $05, $05, $05, $05, $05, $05, $05
        dc.b    $05, $05, $05, $05, $05, $05, $05, $05
        dc.b    $06, $06, $06, $06, $06, $06, $06, $06
        dc.b    $06, $06, $06, $06, $06, $06, $06, $06
        dc.b    $06, $06, $06, $06, $06, $06, $06, $06
        dc.b    $06, $06, $06, $06, $06, $06, $06, $06
        dc.b    $07, $07, $07, $07, $07, $07, $07, $07
        dc.b    $07, $07, $07, $07, $07, $07, $07, $07
        dc.b    $07, $07, $07, $07, $07, $07, $07, $07
        dc.b    $07, $07, $07, $07, $07, $07, $07, $07
        dc.b    $07, $07, $07, $07, $07, $07, $07, $07
        dc.b    $07, $07, $07, $07, $07, $07, $07, $07
        dc.b    $07, $07, $07, $07, $07, $07, $07, $07
        dc.b    $07, $07, $07, $07, $07, $07, $07, $07

Log2Table:
        dc.b    $00, $00, $1F, $32, $3F, $49, $52, $59
        dc.b    $5F, $64, $69, $6E, $72, $75, $79, $7C
        dc.b    $7F, $82, $84, $87, $89, $8C, $8E, $90
        dc.b    $92, $94, $95, $97, $99, $9A, $9C, $9E
        dc.b    $9F, $A0, $A2, $A3, $A4, $A6, $A7, $A8
        dc.b    $A9, $AA, $AC, $AD, $AE, $AF, $B0, $B1
        dc.b    $B2, $B3, $B4, $B5, $B5, $B6, $B7, $B8
        dc.b    $B9, $BA, $BA, $BB, $BC, $BD, $BE, $BE
        dc.b    $BF, $C0, $C0, $C1, $C2, $C2, $C3, $C4
        dc.b    $C4, $C5, $C6, $C6, $C7, $C8, $C8, $C9
        dc.b    $C9, $CA, $CA, $CB, $CC, $CC, $CD, $CD
        dc.b    $CE, $CE, $CF, $CF, $D0, $D0, $D1, $D1
        dc.b    $D2, $D2, $D3, $D3, $D4, $D4, $D5, $D5
        dc.b    $D5, $D6, $D6, $D7, $D7, $D8, $D8, $D8
        dc.b    $D9, $D9, $DA, $DA, $DA, $DB, $DB, $DC
        dc.b    $DC, $DC, $DD, $DD, $DE, $DE, $DE, $DF
        dc.b    $DF, $DF, $E0, $E0, $E0, $E1, $E1, $E1
        dc.b    $E2, $E2, $E2, $E3, $E3, $E3, $E4, $E4
        dc.b    $E4, $E5, $E5, $E5, $E6, $E6, $E6, $E7
        dc.b    $E7, $E7, $E8, $E8, $E8, $E8, $E9, $E9
        dc.b    $E9, $EA, $EA, $EA, $EA, $EB, $EB, $EB
        dc.b    $EC, $EC, $EC, $EC, $ED, $ED, $ED, $ED
        dc.b    $EE, $EE, $EE, $EE, $EF, $EF, $EF, $F0
        dc.b    $F0, $F0, $F0, $F1, $F1, $F1, $F1, $F1
        dc.b    $F2, $F2, $F2, $F2, $F3, $F3, $F3, $F3
        dc.b    $F4, $F4, $F4, $F4, $F5, $F5, $F5, $F5
        dc.b    $F5, $F6, $F6, $F6, $F6, $F7, $F7, $F7
        dc.b    $F7, $F7, $F8, $F8, $F8, $F8, $F8, $F9
        dc.b    $F9, $F9, $F9, $F9, $FA, $FA, $FA, $FA
        dc.b    $FA, $FB, $FB, $FB, $FB, $FB, $FC, $FC
        dc.b    $FC, $FC, $FC, $FD, $FD, $FD, $FD, $FD
        dc.b    $FE, $FE, $FE, $FE, $FE, $FE, $FF, $FF

Atan2Table:
        dc.b    $00, $00, $00, $00, $00, $00, $00, $00
        dc.b    $00, $00, $00, $00, $00, $00, $00, $00
        dc.b    $00, $00, $00, $00, $00, $00, $00, $00
        dc.b    $00, $00, $00, $00, $00, $00, $00, $00
        dc.b    $00, $00, $00, $00, $00, $00, $00, $00
        dc.b    $00, $00, $00, $00, $00, $00, $00, $00
        dc.b    $00, $00, $00, $00, $00, $00, $01, $01
        dc.b    $01, $01, $01, $01, $01, $01, $01, $01
        dc.b    $01, $01, $01, $01, $01, $01, $01, $01
        dc.b    $01, $01, $01, $01, $01, $01, $01, $01
        dc.b    $01, $01, $01, $01, $01, $01, $01, $01
        dc.b    $01, $01, $01, $01, $01, $01, $01, $01
        dc.b    $01, $01, $01, $01, $01, $01, $01, $01
        dc.b    $01, $02, $02, $02, $02, $02, $02, $02
        dc.b    $02, $02, $02, $02, $02, $02, $02, $02
        dc.b    $02, $02, $02, $02, $02, $02, $02, $02
        dc.b    $03, $03, $03, $03, $03, $03, $03, $03
        dc.b    $03, $03, $03, $03, $03, $03, $03, $03
        dc.b    $04, $04, $04, $04, $04, $04, $04, $04
        dc.b    $04, $04, $04, $05, $05, $05, $05, $05
        dc.b    $05, $05, $05, $05, $05, $06, $06, $06
        dc.b    $06, $06, $06, $06, $06, $07, $07, $07
        dc.b    $07, $07, $07, $08, $08, $08, $08, $08
        dc.b    $08, $09, $09, $09, $09, $09, $09, $0A
        dc.b    $0A, $0A, $0A, $0B, $0B, $0B, $0B, $0B
        dc.b    $0C, $0C, $0C, $0C, $0D, $0D, $0D, $0D
        dc.b    $0E, $0E, $0E, $0F, $0F, $0F, $0F, $10
        dc.b    $10, $10, $11, $11, $11, $12, $12, $12
        dc.b    $13, $13, $13, $14, $14, $14, $15, $15
        dc.b    $16, $16, $16, $17, $17, $17, $18, $18
        dc.b    $19, $19, $1A, $1A, $1A, $1B, $1B, $1C
        dc.b    $1C, $1C, $1D, $1D, $1E, $1E, $1F, $1F

What I did to get around the input argument limitation was basically just to get the largest of the 2 values (which is needed anyways to do the atan2 calculation properly), and then determine how many bit shifts it would take to get it down into a byte, and then apply that number of bit shifts to both the X and Y (hence, the introduction of WordShiftTable). I'm also using my own generated log2 and atan2 tables. It also takes into account some edge cases regarding the edges of octants. Yes, it's also not save d3 or a2 on the stack. In Sonic 1, it's not really necessary, I found. I dunno about Sonic 2 or 3K, but that shouldn't be hard to get around anyways.

I would have posted this as a reply to the original thread, but seeing as that's now in the archives... yeah. If there's any further optimizations that can be made, or if there's any issues that I somehow missed, I'd like to hear it. Currently, the worst case scenario for this function is like 246 cycles, and the best case scenario (excluding if both X and Y are 0) in the original is like around the 272 cycles mark.

Devon · Sep 15, 2022

Quick update: accidentally used a wrong register in a place. That's been fixed now.

Spicy Bread SSR · Sep 18, 2022

At the start of CalcAngle_YZero, you could do a tst bne instead of a cmpi beq, since d0 can only be 0 or 4 at that time. It saves around 4 cycles, for 0 degrees and 180 degrees. It's not much, but any reason to make someone's optimized code even faster is fine in my books

Devon · Sep 18, 2022

That is indeed true. Updated.

Devon · Dec 7, 2023

I have created an alternate version that is smaller and faster, but it's only designed for cases where you know that both the X and Y values are between -$FF and $FF, since it skips out on performing bit shifts to make the values compatible with the tables. The tables are the same as in the other, but are included here for completeness.

Code:

; ------------------------------------------------------------------------------
; 2-argument arctangent (angle between (0,0) and (x,y)) (8-bit)
; Based on http://codebase64.org/doku.php?id=base:8bit_atan2_8-bit_angle
; ------------------------------------------------------------------------------
; PARAMETERS:
;       d1.w - X value (must be between -$FF and $FF)
;       d2.w - Y value (must be between -$FF and $FF)
; RETURNS:
;       d0.b - 2-argument arctangent value (angle between (0,0) and (x,y))
; ------------------------------------------------------------------------------

CalcAngle8:
        moveq   #0,d0                                           ; Default to bottom right quadrant
        tst.w   d1                                              ; Is the X value negative?
        beq.s   CalcAngle8_XZero                                ; If the X value is zero, branch
        bpl.s   CalcAngle8_CheckY                               ; If not, branch
        neg.w   d1                                              ; If so, get the absolute value
        moveq   #4,d0                                           ; Shift to left quadrant
 
CalcAngle8_CheckY:
        tst.w   d2                                              ; Is the Y value negative?
        beq.s   CalcAngle8_YZero                                ; If the Y value is zero, branch
        bpl.s   CalcAngle8_Divide                               ; If not, branch
        neg.w   d2                                              ; If so, get the absolute value
        addq.b  #2,d0                                           ; Shift to top quadrant

CalcAngle8_Divide:
        lea     Log2Table(pc),a2                                ; Perform logarithmic division
        move.b  (a2,d2.w),d2
        sub.b   (a2,d1.w),d2
        bcs.s   CalcAngle8_GetAtan2Val
        neg.b   d2                                              ; Use octant that's horizontally closer to the center
        addq.b  #1,d0

CalcAngle8_GetAtan2Val:
        move.b  Atan2Table(pc,d2.w),d2                          ; Get atan2 value
        move.b  OctantAdjust(pc,d0.w),d0
        eor.b   d2,d0
      
CalcAngle8_End:
        rts

; ------------------------------------------------------------------------------

CalcAngle8_YZero:
        tst.b   d0                                              ; Was the X value negated?
        beq.s   CalcAngle8_End                                  ; If not, branch (d0 is already 0, so no need to set it again on branch)
        moveq   #$FFFFFF80,d0                                   ; 180 degrees
        rts

CalcAngle8_XZero:
        tst.w   d2                                              ; Is the Y value negative?
        bmi.s   CalcAngle8_XZeroYNeg                            ; If so, branch
        moveq   #$40,d0                                         ; 90 degrees
        rts

CalcAngle8_XZeroYNeg:
        moveq   #$FFFFFFC0,d0                                   ; 270 degrees
        rts

; ------------------------------------------------------------------------------

OctantAdjust:
        dc.b    %00000000                                       ; +X, +Y, |X|>|Y|
        dc.b    %00111111                                       ; +X, +Y, |X|<|Y|
        dc.b    %11111111                                       ; +X, -Y, |X|>|Y|
        dc.b    %11000000                                       ; +X, -Y, |X|<|Y|
        dc.b    %01111111                                       ; -X, +Y, |X|>|Y|
        dc.b    %01000000                                       ; -X, +Y, |X|<|Y|
        dc.b    %10000000                                       ; -X, -Y, |X|>|Y|
        dc.b    %10111111                                       ; -X, -Y, |X|<|Y|

Atan2Table:
        dc.b    $00, $00, $00, $00, $00, $00, $00, $00
        dc.b    $00, $00, $00, $00, $00, $00, $00, $00
        dc.b    $00, $00, $00, $00, $00, $00, $00, $00
        dc.b    $00, $00, $00, $00, $00, $00, $00, $00
        dc.b    $00, $00, $00, $00, $00, $00, $00, $00
        dc.b    $00, $00, $00, $00, $00, $00, $00, $00
        dc.b    $00, $00, $00, $00, $00, $00, $01, $01
        dc.b    $01, $01, $01, $01, $01, $01, $01, $01
        dc.b    $01, $01, $01, $01, $01, $01, $01, $01
        dc.b    $01, $01, $01, $01, $01, $01, $01, $01
        dc.b    $01, $01, $01, $01, $01, $01, $01, $01
        dc.b    $01, $01, $01, $01, $01, $01, $01, $01
        dc.b    $01, $01, $01, $01, $01, $01, $01, $01
        dc.b    $01, $02, $02, $02, $02, $02, $02, $02
        dc.b    $02, $02, $02, $02, $02, $02, $02, $02
        dc.b    $02, $02, $02, $02, $02, $02, $02, $02
        dc.b    $03, $03, $03, $03, $03, $03, $03, $03
        dc.b    $03, $03, $03, $03, $03, $03, $03, $03
        dc.b    $04, $04, $04, $04, $04, $04, $04, $04
        dc.b    $04, $04, $04, $05, $05, $05, $05, $05
        dc.b    $05, $05, $05, $05, $05, $06, $06, $06
        dc.b    $06, $06, $06, $06, $06, $07, $07, $07
        dc.b    $07, $07, $07, $08, $08, $08, $08, $08
        dc.b    $08, $09, $09, $09, $09, $09, $09, $0A
        dc.b    $0A, $0A, $0A, $0B, $0B, $0B, $0B, $0B
        dc.b    $0C, $0C, $0C, $0C, $0D, $0D, $0D, $0D
        dc.b    $0E, $0E, $0E, $0F, $0F, $0F, $0F, $10
        dc.b    $10, $10, $11, $11, $11, $12, $12, $12
        dc.b    $13, $13, $13, $14, $14, $14, $15, $15
        dc.b    $16, $16, $16, $17, $17, $17, $18, $18
        dc.b    $19, $19, $1A, $1A, $1A, $1B, $1B, $1C
        dc.b    $1C, $1C, $1D, $1D, $1E, $1E, $1F, $1F

Log2Table:
        dc.b    $00, $00, $1F, $32, $3F, $49, $52, $59
        dc.b    $5F, $64, $69, $6E, $72, $75, $79, $7C
        dc.b    $7F, $82, $84, $87, $89, $8C, $8E, $90
        dc.b    $92, $94, $95, $97, $99, $9A, $9C, $9E
        dc.b    $9F, $A0, $A2, $A3, $A4, $A6, $A7, $A8
        dc.b    $A9, $AA, $AC, $AD, $AE, $AF, $B0, $B1
        dc.b    $B2, $B3, $B4, $B5, $B5, $B6, $B7, $B8
        dc.b    $B9, $BA, $BA, $BB, $BC, $BD, $BE, $BE
        dc.b    $BF, $C0, $C0, $C1, $C2, $C2, $C3, $C4
        dc.b    $C4, $C5, $C6, $C6, $C7, $C8, $C8, $C9
        dc.b    $C9, $CA, $CA, $CB, $CC, $CC, $CD, $CD
        dc.b    $CE, $CE, $CF, $CF, $D0, $D0, $D1, $D1
        dc.b    $D2, $D2, $D3, $D3, $D4, $D4, $D5, $D5
        dc.b    $D5, $D6, $D6, $D7, $D7, $D8, $D8, $D8
        dc.b    $D9, $D9, $DA, $DA, $DA, $DB, $DB, $DC
        dc.b    $DC, $DC, $DD, $DD, $DE, $DE, $DE, $DF
        dc.b    $DF, $DF, $E0, $E0, $E0, $E1, $E1, $E1
        dc.b    $E2, $E2, $E2, $E3, $E3, $E3, $E4, $E4
        dc.b    $E4, $E5, $E5, $E5, $E6, $E6, $E6, $E7
        dc.b    $E7, $E7, $E8, $E8, $E8, $E8, $E9, $E9
        dc.b    $E9, $EA, $EA, $EA, $EA, $EB, $EB, $EB
        dc.b    $EC, $EC, $EC, $EC, $ED, $ED, $ED, $ED
        dc.b    $EE, $EE, $EE, $EE, $EF, $EF, $EF, $F0
        dc.b    $F0, $F0, $F0, $F1, $F1, $F1, $F1, $F1
        dc.b    $F2, $F2, $F2, $F2, $F3, $F3, $F3, $F3
        dc.b    $F4, $F4, $F4, $F4, $F5, $F5, $F5, $F5
        dc.b    $F5, $F6, $F6, $F6, $F6, $F7, $F7, $F7
        dc.b    $F7, $F7, $F8, $F8, $F8, $F8, $F8, $F9
        dc.b    $F9, $F9, $F9, $F9, $FA, $FA, $FA, $FA
        dc.b    $FA, $FB, $FB, $FB, $FB, $FB, $FC, $FC
        dc.b    $FC, $FC, $FC, $FD, $FD, $FD, $FD, $FD
        dc.b    $FE, $FE, $FE, $FE, $FE, $FE, $FF, $FF

; ------------------------------------------------------------------------------

You could get away with using this variant when managing Sonic's collision in the air, since the usage for CalcAngle there is to detect the general direction Sonic is moving, which doesn't really need precision, since it only picks from the 4 cardinal directions. You'd just use the integer parts of the X and Y speeds (their high bytes) for the parameters instead of the full 16-bit values.

I also updated the other one to use neg instead of not for getting the absolute value of the X and Y parameters.

Log in or Sign up

Optimized CalcAngle (atan2 function)

Devon Do not contact me. Member

Devon Do not contact me. Member

Spicy Bread SSR You can call me Mal if you like Member

Devon Do not contact me. Member

Devon Do not contact me. Member

Log in or Sign up

Optimized CalcAngle (atan2 function)

Devon Do not contact me. Member

Devon Do not contact me. Member

Spicy Bread SSR You can call me Mal if you like Member

Devon Do not contact me. Member

Devon Do not contact me. Member

Useful Searches