So let's cut right to the chase. Here's how (I understand) the Sonic games handle displaying sprites: There's a big table of sprites to be processed in RAM. This table is $400 bytes in total. Every $80 bytes are a separate "layer", with the first layer ($0-$80) being in front of everything and the last layer ($380-$400) being behind everything. Every object in the game has a "priority" SST, which tells the game which "layer" it should go to. The DisplaySprite subroutine is what handles sending sprites to the table. Sonic 1 and 2 have the "priority" SST as a byte, and to transform it into a "layer address", they do this in DisplaySprite: Code: lea (Sprite_Table_Input).w,a1 move.w priority(a0),d0 lsr.w #1,d0 andi.w #$380,d0 adda.w d0,a1 Sonic 3 and Sonic & Knuckles, on the other hand, have the "priority" SST as a word, and in DisplaySprite they simply do this: Code: lea (Sprite_Table_Input).w,a1 adda.w priority(a0),a1 So obviously, S3K's way of doing it is more optimized than the S1/S2 method. And there is a guide on how to port the S3K method to S2 (thanks redhotsonic). However, the guide isn't very... intuitive, and freeing up an SST shared with (almost) all objects is somewhat difficult, and you might want to use that SST for something else anyway. So, why not employ another way to speed up this subroutine? Here's my suggestion: Code: DisplaySprite: moveq #0,d0 move.b priority(a0),d0 andi.b #7,d0 ; safety measure. this shouldn't be needed, so remove this if you want a bit more speed add.w d0,d0 movea.w Priority2InputAddrTable(pc,d0.w),a1 ; (snip) ; --------------------------------------------------------------------------- Priority2InputAddrTable: dc.w Sprite_Table_Input dc.w Sprite_Table_Input+$80 dc.w Sprite_Table_Input+$100 dc.w Sprite_Table_Input+$180 dc.w Sprite_Table_Input+$200 dc.w Sprite_Table_Input+$280 dc.w Sprite_Table_Input+$300 dc.w Sprite_Table_Input+$380 Basically, instead of loading up the sprite table input address to a1 and then calculating (except in S3K) and adding an offset to it, my version simply cuts out the middleman and uses the priority as an index into a table containing layer addresses. Thus, being (possibly) faster, at the expense of a very tiny amount of ROM usage. ...yeah, if you haven't noticed, I'm not actually sure if this is any faster. Can someone let me know, please? also why does this forum not have syntax highlighting for 68kASM? can xenforo just not do that?
I doubt its much faster than S3K. What I do personally, is use the S3K method but load the address directly in priority(a0). This shortens the code to just loading from priority(a0) to a1, saving 16-20 cycles (I forget exactly). Its a really easy change for S3K hacks and gives a nice speed boost.
Hey, I never implied that! I'll make a mental note of your optimization in case I ever want to go insane hack S3K.
For the record, here are the cycle timings for each method: Code: lea (sprite_table_input).w,a1 ; 4 bytes 8(2/0) move.w priority(a0),d0 ; 4 bytes 12(3/0) lsr.w #1,d0 ; 2 bytes 6(1/0) + 2n(0/0) where n is shift or rotate count andi.w #$380,d0 ; 4 bytes 8(2/0) adda.w d0,a1 ; 2 bytes 8(1/0) Code: moveq #0,d0 ; 2 bytes 4(1/0) move.b priority(a0),d0 ; 4 bytes 12(3/0) andi.b #7,d0 ; 4 bytes 8(2/0) add.w d0,d0 ; 2 bytes 4(1/0) movea.w Priority2InputAddrTable(pc,d0.w),a1 ; 4 bytes 14(3/0) Code: lea (sprite_table_input).w,a1 ; 4 bytes 8(2/0) adda.w priority(a0),a1 ; 4 bytes 16(3/0) Code: movea.w priority(a0),a1 ; 4 bytes 12(3/0) So, 44 cycles for the original, 32 (or 40 if you include the and instruction!) for the new version. Indeed, it is faster, and actually I am surprised it is that much faster. Pretty neat optimization trick. For the S3K one and my method, the difference is: 24 vs 12, or about half the cycles needed!
...wow. I was only expecting it to be like 1 or 2 cycles faster. Shows you how much I know about working with ASM, I guess.