Speed up ring loss processing...

Discussion in 'Approved' started by SpirituInsanum, May 6, 2011.

Thread Status:
Not open for further replies.
  1. SpirituInsanum

    SpirituInsanum Well-Known Member Member

    Joined:
    Feb 11, 2010
    Messages:
    642
    I was trying to get rid of some of the most annoying slowdowns in the game, when I got this idea. Since it's extremely versatile, I thought I could share it without taking the risk to see the exact same thing in all hacks. So, the condition to use this is essentially that you'll make your own pattern (and possibly give credits but I don't care much about that). I won't give the data to avoid copy/paste, but the process is detailed and simple.


    Actually, it's extremely simplistic, it only requires to find a good equation, use your favorite spreadsheet software and modify a few details in the code.


    The idea is to use some pre-computed data stored in a simple array rather than do useless calculations (that would always lead to the exact same result). This way, you get rid of a waste of hundreds of processor cycles. So this is also a simple example of how to use an array.

    THE CODE




    Here's the original lost ring code (from the svn), it's Obj37 in Hivebrain's disassembly:




    RingLoss: ; XREF: Obj_Index


    moveq #0,d0


    move.b obRoutine(a0),d0


    move.w RLoss_Index(pc,d0.w),d1


    jmp RLoss_Index(pc,d1.w)


    ; ===========================================================================


    RLoss_Index: dc.w RLoss_Count-RLoss_Index


    dc.w RLoss_Bounce-RLoss_Index


    dc.w RLoss_Collect-RLoss_Index


    dc.w RLoss_Sparkle-RLoss_Index


    dc.w RLoss_Delete-RLoss_Index


    ; ===========================================================================


    RLoss_Count: ; Routine 0


    movea.l a0,a1


    moveq #0,d5


    move.w (v_rings).w,d5 ; check number of rings you have


    moveq #32,d0


    cmp.w d0,d5 ; do you have 32 or more?


    bcs.s @belowmax ; if not, branch


    move.w d0,d5 ; if yes, set d5 to 32


    @belowmax:


    subq.w #1,d5


    move.w #$288,d4 ; << THIS LINE (1)


    bra.s @makerings


    ; ===========================================================================


    @loop:


    bsr.w FindFreeObj


    bne.w @resetcounter


    @makerings:


    move.b #id_RingLoss,0(a1) ; load bouncing ring object


    addq.b #2,obRoutine(a1)


    move.b #8,obHeight(a1)


    move.b #8,obWidth(a1)


    move.w obX(a0),obX(a1)


    move.w obY(a0),obY(a1)


    move.l #Map_Ring,obMap(a1)


    move.w #$27B2,obGfx(a1)


    move.b #4,obRender(a1)


    move.b #3,obPriority(a1)


    move.b #$47,obColType(a1)


    move.b #8,obActWid(a1)


    move.b #-1,(v_ani3_time).w ; this is "move.b #-1,($FFFFFFA6).w" in Hivebrain's disassembly


    tst.w d4 ; << THIS LINE (2)


    bmi.s @loc_9D62 ; << THIS LINE (2)


    move.w d4,d0 ; << THIS LINE (2)


    bsr.w CalcSine ; << THIS LINE (2)


    move.w d4,d2 ; << THIS LINE (2)


    lsr.w #8,d2 ; << THIS LINE (2)


    asl.w d2,d0 ; << THIS LINE (2)


    asl.w d2,d1 ; << THIS LINE (2)


    move.w d0,d2 ; << THIS LINE (2)


    move.w d1,d3 ; << THIS LINE (2)


    addi.b #$10,d4 ; << THIS LINE (2)


    bcc.s @loc_9D62 ; << THIS LINE (2)


    subi.w #$80,d4 ; << THIS LINE (2)


    bcc.s @loc_9D62 ; << THIS LINE (2)


    move.w #$288,d4 ; << THIS LINE (2)


    @loc_9D62: ; << THIS LINE (2)


    move.w d2,obVelX(a1) ; << THIS LINE (2)


    move.w d3,obVelY(a1) ; << THIS LINE (2)


    neg.w d2 ; << THIS LINE (2)


    neg.w d4 ; << THIS LINE (2)


    dbf d5,@loop ; repeat for number of rings (max 31)


    @resetcounter:


    move.w #0,(v_rings).w ; reset number of rings to zero


    move.b #$80,(f_ringcount).w ; update ring counter


    move.b #0,(v_lifecount).w


    sfx sfx_RingLoss ; play ring loss sound


    RLoss_Bounce: ; Routine 2


    move.b (v_ani3_frame).w,obFrame(a0)


    bsr.w SpeedToPos


    addi.w #$18,obVelY(a0)


    bmi.s @chkdel


    move.b (v_vbla_byte).w,d0


    add.b d7,d0


    andi.b #3,d0


    bne.s @chkdel


    jsr ObjFloorDist


    tst.w d1


    bpl.s @chkdel


    add.w d1,obY(a0)


    move.w obVelY(a0),d0


    asr.w #2,d0


    sub.w d0,obVelY(a0)


    neg.w obVelY(a0)


    @chkdel:


    tst.b (v_ani3_time).w


    beq.s RLoss_Delete


    move.w (v_limitbtm2).w,d0


    addi.w #$E0,d0


    cmp.w obY(a0),d0 ; has object moved below level boundary?


    bcs.s RLoss_Delete ; if yes, branch


    bra.w DisplaySprite


    ; ===========================================================================


    RLoss_Collect: ; Routine 4


    addq.b #2,obRoutine(a0)


    move.b #0,obColType(a0)


    move.b #1,obPriority(a0)


    bsr.w CollectRing


    RLoss_Sparkle: ; Routine 6


    lea (Ani_Ring).l,a1


    bsr.w AnimateSprite


    bra.w DisplaySprite


    ; ===========================================================================


    RLoss_Delete: ; Routine 8


    bra.w DeleteObject



    The lines with "<< THIS LINE" as a comment are the ones we're going to change.


    Let's do this first:


    Change the line marked with a "(1)" with:



    lea SpillingRingData,a3 ; load the address of the array in a3



    You can choose another name, of course.


    And change all the lines marked with a "(2)" with this code:




    move.w (a3)+,$12(a1) ; move the data contained in the array to the y velocity and increment the address in a3


    move.w (a3)+,$10(a1) ; move the data contained in the array to the x velocity and increment the address in a3



    That's it for the asm part. Short, isn't it?

    THE DATA




    Open the spreadsheet software of your choice.


    - In a first column, put 0 to "n" (32 in the original game).


    - In a second column, you have to set the absolute speed of the rings. The maximum absolute speed should be approximately -1000 to -1200 (not in hex).It's negative because of the coordinate system of Sonic.


    You can use either all the same values, or different ones (eg: -1200, -800, -1150, -775...)


    - In the third column, you have to put the angle equation:


    The parameters should be something like this:


    a=pi/2+b/t*n+c


    Where: "a" is the angle,


    "b" is the total angle covered by your rings (a full circle is 2*pi),


    "t" is the maximum number of rings that will appear,


    "n" is the number of the current ring,


    "c" is a bias, 0 if you want your first ring to move vertically, another value if you want it to go left/right.


    - In the fourth column, you'll put the vertical speed:


    v=sin(a)*s


    Where: "v" is the vertical speed


    "a" is the angle (third column)


    "s" is the absolute speed of the ring (second column)


    - In the fifth column, you'll put the horizontal speed:


    h=cos(a)*s


    Where: "h" is the horizontal speed


    "a" is the angle (third column)


    "s" is the absolute speed of the ring (second column)


    Now, setup the 4th and 5th columns to show 0 decimals. Copy their content in your text editor, and arrange things so it looks like this:




    SpillingRingData:


    dc.w -1100, 0


    dc.w -847, -67


    dc.w -847, 67


    dc.w -1086, -174


    ; (...)


    ; (...)


    ; (...)


    even



    You don't have to turn the decimal values into hexadecimal values, the assembler will do it for you.


    Make sure you have as many lines as the maximum amount of rings you can lose.


    If you want your rings to got in both directions, make lines for only half of the rings, duplicate all lines (excepted the purely vertical ones) and negate the horizontal speed.


    You'll only have to change the absolute speeds and the total angle to design tons of great patterns, and with a few more changes you can even have several patterns in the same game.


    Put your new array wherever you want in your code (for example before or after the object).


    Enjoy!
     
    Last edited by a moderator: Apr 29, 2012
    Nat The Porcupine likes this.
  2. MarkeyJester

    MarkeyJester ! % # @ Member

    Joined:
    Jun 27, 2009
    Messages:
    2,668
    Hah, you know, the closest I got to touching the ring spawn code was to reduce the number of rings being spawned, this looks far more fun and entertaining, fabulous work =)
     
  3. SpirituInsanum

    SpirituInsanum Well-Known Member Member

    Joined:
    Feb 11, 2010
    Messages:
    642
    Have fun :(


    Making pretty shapes with equations was the only reason why I liked maths at school :p


    My current version is more than conventional though.


    Btw, the ring creation is only part of the reason why the game slows down that much, calculating the position of so many rings slows things down as well (among other things I guess).


    If one wants to make it a little faster, the "rloss_bounce" part can be modified. Very few things can be optimized, as far as I know, but the whole "speedtopos" thing can be shortened.


    In order to improve the speed a little, use longwords instead of words in the array (but multiply the absolute speed by 256), and process the whole part about position directly in the "rloss_bounce" routine with those longwords rather than using speedtopos.


    Longwords won't fit in $10(An) and $12(An), so the object's scratch ram has to be used to store the data instead (for example $32(An) and $36(An))


    This removes at least the "ext", "asl" and branches. On the other hand you have to process longwords for the bounce, but that's it. I didn't count precisely, but it saves roughly 60-70 processing cycles per ring, that's around 2000 cycles saved per frame for 32 rings.


    It's a minor optimization compared to the previous one, but it can make the difference if you really want to save cycles.
     
    Last edited by a moderator: May 7, 2011
  4. Crash

    Crash Well-Known Member Member

    Joined:
    Jul 15, 2010
    Messages:
    300
    Location:
    Australia
    This is a LOT faster than calculating the angles on the fly, in fact it almost feels "wrong" to not have at least a little but of slowdown when losing the max amount of rings :(
     
  5. SpirituInsanum

    SpirituInsanum Well-Known Member Member

    Joined:
    Feb 11, 2010
    Messages:
    642
    Yeah, I think we managed to convince ourselves it was some kind of deliberate slow motion effect, even though we all knew it was just too much for the hardware. ^^'
     
  6. theocas

    theocas #! Member

    Joined:
    Apr 10, 2010
    Messages:
    375
    I got all the ASM changes into my code, but the spreadsheet part is seriously screwing me up. How would one go about setting all the equations up to work? Either I'm just plain stupid... or I don't know what.


    I've got this set up so far:


    [​IMG]


    The computed values are supposed to show up under the equations, right?


    Basically, I guess what I'm saying is: Anyone that knows anything more than me about spreadsheet programs, please provide us with an example spreadsheet.
     
    Last edited by a moderator: Jun 18, 2011
  7. SpirituInsanum

    SpirituInsanum Well-Known Member Member

    Joined:
    Feb 11, 2010
    Messages:
    642
    In the "a=" part, the "a" represents the cell itself, so you don't need the "a", only the "=" (eg: =cos(blablabla)), also, "pi" won't work as is in some spreadsheet editors, you may have to use things such as "pi()", you can usually find this in the formula panel.


    In you spreadsheet, the content of the cell C3 should be " =pi()/2+$A$1/($B$1*A3)+$C$1 ". In A1, you put your total of rings (33 according to your screenshot), in B1, you put the total angle for rings =2*pi() for a full circle, and in C1, you put a value representing the angle of the first ring (0 will be vertical up, pi will be vertical down, pi/2 will be left, etc.)


    In D3, put " =sin(C3)*B3 ", and I think you can figure out E3.


    Once you're done, select a cell with the equation, click on the small black square in the corner and drag down to copy the formula to the other cells. Addresses with a $ won't change and addresses without will be incremented automatically.
     
  8. theocas

    theocas #! Member

    Joined:
    Apr 10, 2010
    Messages:
    375
    Thanks a lot, I've managed to figure it out for myself in Numbers. I've set it up so that instead of a fixed velocity, it uses the RANDBETWEEN macro, which I think is also available in Excel. The code works like a charm, btw!
     
  9. hitaxas

    hitaxas I can has super speed? Member

    Joined:
    Aug 13, 2007
    Messages:
    152
    I don't think I quite understand how to set this up. I have done everything I read in the OP plus the response to theocas, and still this doesn't look right.


    [​IMG]
     
  10. SpirituInsanum

    SpirituInsanum Well-Known Member Member

    Joined:
    Feb 11, 2010
    Messages:
    642
    The values seem right, excepted for those div/0 (though about all the rings will go up/right, check the angle formula).


    You have to find the option to show 0 decimals in cells. In excel and works, I think it's in cell format. Don't know about the others. This will remove everything after the dot which makes it easier to copy the values.
     
    Last edited by a moderator: Jun 19, 2011
  11. hitaxas

    hitaxas I can has super speed? Member

    Joined:
    Aug 13, 2007
    Messages:
    152
    I sort of have it figured out. I was going to go for something more like the original, but so far I have ended up with a triangle shaped ring loss spray. I understand it enough to tweak it by hand, which I will get around to at some point.
     
  12. MarkeyJester

    MarkeyJester ! % # @ Member

    Joined:
    Jun 27, 2009
    Messages:
    2,668
    Sorry for the bump, but I felt this was worth posting:



    Code:
    ; ===========================================================================
    
    ; ---------------------------------------------------------------------------
    
    ; Ring Spawn Array
    
    ; ---------------------------------------------------------------------------
    
    
    
    SpillRingData:	dc.w	$FF3C,$FC14, $00C4,$FC14, $FDC8,$FCB0, $0238,$FCB0
    
    		dc.w	$FCB0,$FDC8, $0350,$FDC8, $FC14,$FF3C, $03EC,$FF3C
    
    		dc.w	$FC14,$00C4, $03EC,$00C4, $FCB0,$0238, $0350,$0238
    
    		dc.w	$FDC8,$0350, $0238,$0350, $FF3C,$03EC, $00C4,$03EC
    
    		dc.w	$FF9E,$FE0A, $0062,$FE0A, $FEE4,$FE58, $011C,$FE58
    
    		dc.w	$FE58,$FEE4, $01A8,$FEE4, $FE0A,$FF9E, $01F6,$FF9E
    
    		dc.w	$FE0A,$0062, $01F6,$0062, $FE58,$011C, $01A8,$011C
    
    		dc.w	$FEE4,$01A8, $011C,$01A8, $FF9E,$01F6, $0062,$01F6
    
    		even
    
    
    
    ; ===========================================================================

    These are the actual positions that the ROM calculates for the 32 rings in the old code, now in an array ready for use with your more optimised code.
     
    Nat The Porcupine likes this.
  13. redhotsonic

    redhotsonic Also known as RHS Retired Staff

    Joined:
    Aug 10, 2007
    Messages:
    2,967
    Location:
    England
    I found out another way you can speed up the ring loss process even more. I thought I'd post it here as it's related, I hope SpirituInsanum doesn't mind. I just ported the S3K priority manager into my hack, and you can make the scattered rings object do something extremely similar. Okay, so this is only making one object do it, but because the amount of rings you can lose, this actually makes a BIG difference (as usual, especially underwater).


    Go to "loc_120BA:" (part of the scattered rings object) and delete this line:



    move.b #3,priority(a1)



    When the rings are being created, it won't have to move 3 to it's priority anymore, saving a command per ring created and slightly speeding it up (you'll probably won't notice any difference, but this isn't what I am trying to accomplish here).


    Now you're probably thinking "WTF are you thinking you crazy boy? The rings aren't going to be displayed now!" Well, in a way, you're wrong. It will still be displayed, but with a priority of 0. Now, we do not want that, we still want it to be 3, and we're about to fix it, and this will be what speeds it up dramatically.


    Go to "loc_121B8" and you should see a command:



    bra.w DisplaySprite



    We are going to replace this with a new code, similar to how S3K does it. In S2, before it can display the sprite, it convert's the object's priority into a word, and then displays it. It does calculations that are about 3 lines long to convert it into a word. When lots of objects use this code every single frame (Sonic and Tails constantly for example), it can be a slow process.


    Now, imagine you just got hurt and lost 32 rings, each one of them 32 rings branches to DisplaySprite, does the calculations, then displays the sprite; every single frame! All 32 of them! This slows it down quite a bit! Now, you can't just turn the scattered ring's priority into a word, otherwise it will over-write the scattered rings's "width_pixel". So, what do we do? Well, we can just copy part of the DisplaySprite's coding and insert it into the scattered rings' object coding.


    Now, normally, after the rings have jumped to DisplaySprite to convert it's priority into a word, it becomes $180. Look at this:


    Priority = 3 (byte)



    DisplaySprite:
    lea (Sprite_Table_Input).w,a1


    move.w priority(a0),d0 ; To be deleted


    lsr.w #1,d0 ; To be deleted


    andi.w #$380,d0 ; To be deleted


    adda.w d0,a1 ; To be changed


    cmpi.w #$7E,(a1)


    bcc.s return_16510


    addq.w #2,(a1)


    adda.w (a1),a1


    move.w a0,(a1)


    return_16510:


    rts



    Now, Priority = $180 (word)


    So, we're going to copy this code, then edit it a bit, then move it to the scattered rings object. So copy it, then move it to where the branch was in the scattered rings object. So you have something looking like this:


    WAS:



    loc_121B8:

    tst.b (Ring_spill_anim_counter).w


    beq.s BranchTo5_DeleteObject


    move.w (Camera_Max_Y_pos_now).w,d0


    addi.w #$E0,d0


    cmp.w y_pos(a0),d0


    bcs.s BranchTo5_DeleteObject


    bra.w DisplaySprite



    And change that to this:



    loc_121B8:

    tst.b (Ring_spill_anim_counter).w


    beq.s BranchTo5_DeleteObject


    move.w (Camera_Max_Y_pos_now).w,d0


    addi.w #$E0,d0


    cmp.w y_pos(a0),d0


    bcs.s BranchTo5_DeleteObject


    lea (Sprite_Table_Input).w,a1


    move.w priority(a0),d0 ; To be deleted


    lsr.w #1,d0 ; To be deleted


    andi.w #$380,d0 ; To be deleted


    adda.w d0,a1 ; To be changed


    cmpi.w #$7E,(a1)


    bcc.s +


    addq.w #2,(a1)


    adda.w (a1),a1


    move.w a0,(a1)


    +


    rts



    "To be changed" means that line is going to be edited and the "To be deleted", well, guess what we will do with that? =P


    Change



    adda.w d0,a1 ; To be changed



    To this:



    adda.w #$180,a1



    As we now know 3 would equal $180, we can just add $180 straight to a1. No more calculations every single frame, we've already done it!


    You should have something like this now:



    loc_121B8:

    tst.b (Ring_spill_anim_counter).w


    beq.s BranchTo5_DeleteObject


    move.w (Camera_Max_Y_pos_now).w,d0


    addi.w #$E0,d0


    cmp.w y_pos(a0),d0


    bcs.s BranchTo5_DeleteObject


    lea (Sprite_Table_Input).w,a1


    adda.w #$180,a1


    cmpi.w #$7E,(a1)


    bcc.s +


    addq.w #2,(a1)


    adda.w (a1),a1


    move.w a0,(a1)


    +


    rts



    So, what have we done? Well, we saved a line of it branching somewhere for a start, it's already there, so that's a plus! Also, we're moving $180 straight to a1, rather than doing them 3 lines of calculations! What a time saver! ALL done. That was easy, eh? Now your scattered rings will be even quicker, hopefully not slowing down anything!


    Enjoy.


    EDIT


    This is optional, you do not have to do this if you do not like


    If you look below the code you just edited:



    Obj_37_sub_4:
    addq.b #2,routine(a0)


    move.b #0,collision_flags(a0)


    move.b #1,priority(a0)


    bsr.w sub_11FC2


    Obj_37_sub_6:


    lea (byte_1237A).l,a1


    bsr.w AnimateSprite


    bra.w DisplaySprite


    ; ===========================================================================


    BranchTo5_DeleteObject


    bra.w DeleteObject



    You can see a command move 1 to priority, and displaying it again. This is for when you collect the rings (when the rings turn into them sparkly effects). You can do something extremely similar here if you like. It will make quite a bit of a difference if you collect a lot of scattered rings at the same time, otherwise, it won't do too much. If you want it to do the same, then you can. But it won't be $180 again! When 1 has been through DisplaySprite's calculations, it will equal $80 instead!


    So, at "Obj_37_sub_4:", delete this line/command:



    move.b #1,priority(a0)



    Then, at "Obj_37_sub_6:", replace:



    bra.w DisplaySprite



    with this:



    lea (Sprite_Table_Input).w,a1
    adda.w #$80,a1


    cmpi.w #$7E,(a1)


    bcc.s +


    addq.w #2,(a1)


    adda.w (a1),a1


    move.w a0,(a1)


    +


    rts



    So you have something like this:



    Obj_37_sub_4:
    addq.b #2,routine(a0)


    move.b #0,collision_flags(a0)


    bsr.w sub_11FC2


    Obj_37_sub_6:


    lea (byte_1237A).l,a1


    bsr.w AnimateSprite


    lea (Sprite_Table_Input).w,a1


    adda.w #$80,a1


    cmpi.w #$7E,(a1)


    bcc.s +


    addq.w #2,(a1)


    adda.w (a1),a1


    move.w a0,(a1)


    +


    rts



    Done, you've made it slightly faster again!


    As usual, here is a S2 ROM with this guide applied. It also has this guide and this guide applied.


    Cheers,


    redhotsonic


    EDIT: I realised I accidently left a branch in the guide which I have fixed in the guide.


    Accidently, I put this:



    loc_121B8:

    tst.b (Ring_spill_anim_counter).w


    beq.s BranchTo5_DeleteObject


    move.w (Camera_Max_Y_pos_now).w,d0


    addi.w #$E0,d0


    cmp.w y_pos(a0),d0


    bcs.s BranchTo5_DeleteObject


    bra.w DisplaySprite ; I accidently left this here


    lea (Sprite_Table_Input).w,a1


    adda.w #$180,a1


    cmpi.w #$7E,(a1)


    bcc.s +


    addq.w #2,(a1)


    adda.w (a1),a1


    move.w a0,(a1)


    +


    rts



    SO MAKE SURE YOUR ASM FILE ACTUALLY SAYS THIS:



    loc_121B8:

    tst.b (Ring_spill_anim_counter).w


    beq.s BranchTo5_DeleteObject


    move.w (Camera_Max_Y_pos_now).w,d0


    addi.w #$E0,d0


    cmp.w y_pos(a0),d0


    bcs.s BranchTo5_DeleteObject


    lea (Sprite_Table_Input).w,a1


    adda.w #$180,a1


    cmpi.w #$7E,(a1)


    bcc.s +


    addq.w #2,(a1)


    adda.w (a1),a1


    move.w a0,(a1)


    +


    rts
     
    Last edited by a moderator: May 5, 2012
    Nat The Porcupine likes this.
  14. MarkeyJester

    MarkeyJester ! % # @ Member

    Joined:
    Jun 27, 2009
    Messages:
    2,668
    How about:



    Code:
       	 move.w	(a3)+,$12(a1) ; move the data contained in the array to the y velocity and increment the address in a3
    
    		move.w	(a3)+,$10(a1) ; move the data contained in the array to the x velocity and increment the address in a3


    to:





    Code:
       	 move.l	(a3)+,$10(a1)

    And swapping the X and Y addresses over in the array?


    I would like to point out that what you have their rhs is a common case of "speed vs size", and you have to ask yourself if the amount of extra bytes the routine takes up is really worth saving 10-14 cycles of a branch, all I'm saying is, be careful not to go mad with the whole repeat of routines method, there can be major disadvantages to it.
     
  15. redhotsonic

    redhotsonic Also known as RHS Retired Staff

    Joined:
    Aug 10, 2007
    Messages:
    2,967
    Location:
    England

    Um, SpirituInsanum put that, not me, or are you talking about two different things?


    If the "speed vs size" you're talking about is about my priority guide and not this longword command, I see what you mean. But I do not do this everywhere. I've just done this for the scattered rings objects. And it has not caused any bra.s/beq.s to go to bra.w/beq.w


    Yes, the size will grow, but only a very little (and doubt it's actually noticeable). Unless you start doing it everywhere, then it will start to cause problems. But for the scattered rings object which is a big problem for slowdown, this won't cause any problems (hopefully).


    Also, these disadvantages, except for size growing, what are the disadvantages?
     
    Last edited by a moderator: May 5, 2012
  16. SpirituInsanum

    SpirituInsanum Well-Known Member Member

    Joined:
    Feb 11, 2010
    Messages:
    642
    I thought about it back then, but thought it would be clearer and easier to customize this way.


    As an example of customization, I'm personally using 2 long words instead, so it wouldn't work.


    Also, it is not impossible that someone would reorganize the object status table and in that case it can be safer to keep both speeds separated.


    But otherwise it's indeed a perfectly fine improvement.


    (and congrats on your 1000th post)
     
    Last edited by a moderator: May 5, 2012
  17. MarkeyJester

    MarkeyJester ! % # @ Member

    Joined:
    Jun 27, 2009
    Messages:
    2,668
    I know, you were mentioning methods of speeding it up, I just wanted to join in =P

    Yeah, that's what I was referring to.

    Well, there is the disadvantage that by having more data in the ROM, you effectively shift more of the important data further to the end of the ROM, which causes certain short branches to be needed a change to word signed branches, or jump instructions, there's also the issue of not being able to use word sign extension on ROM addresses as they are not within the first 8000 bytes of the ROM. Again, they're not major by any means, I just wanted to be sure that you (and anyone else reading) knows what they're dealling with and understands the consequences.

    Oh no by all means, I have no doubt that you knew right from the beginning, I was not trying to test you or whatever (I'm not that dense of a person), as I said, rhs was looking into further ways of increasing processing time, and I wanted to contribute for the sake of contributing.


    Moving on though, could we look into possible ways of speeding up the individual rings that are being spawned? It seems like we're speeding up the processing of only one singular frame for virtually no reason (considering that the spawning itself doesn't have an impact on how we notice the game lag), I figure that if we have the ability (as a scene) to work out fundamental mehods of improving system speed/size, we could revert our methods to the bigger issue of lag, and perhaps almost eradicate it perminantly.
     
  18. SpirituInsanum

    SpirituInsanum Well-Known Member Member

    Joined:
    Feb 11, 2010
    Messages:
    642
    I know, just telling (explaining the choice in case some would ask why I don't modify the original post).


    The impact of that single frame is kind of huge though, but there's still room for improvement.


    As I was telling RHS earlier, there's indeed at least one more way to slightly improve the performance, and it's precisely related to the longwords I'm using.


    To make it short, in my version, the values in the array are those you would find after the calculations done by the "speedtopos" subroutine, ie 2 asl.l 8. It isn't much either though, for 30 rings those two asl removed save 1440 processor cycles, add the ext.l (4 cycles each) and branches (18 for the bsr and 16 for the rts iirc), but that's a somewhat negligible gain, the total is around 2700 cycles for 30 rings per frame if I'm not mistaken (not sure I got that cycle count thing right).


    There can still be slowdowns though, and the drawback is that you have to use long words instead, so other addresses than obvelx and obvely are necessary.


    Also, about RHS's optimization, if I'm not mistaken (I'm tired) you can directly do lea (Sprite_Table_Input+$180).w,a1 , the adda following it being unnecessary since we already know the result of the addition.
     
    Last edited by a moderator: May 5, 2012
  19. redhotsonic

    redhotsonic Also known as RHS Retired Staff

    Joined:
    Aug 10, 2007
    Messages:
    2,967
    Location:
    England
    Okay, thanks for the tip =)

    That dense lol

    I tried that before, and my hack froze for some reason. That's why I did it the way I did it =P
     
Thread Status:
Not open for further replies.