Adding difficulty modes

Discussion in 'Discussion and Q&A Archive' started by InfiniteWave, Apr 27, 2013.

Thread Status:
Not open for further replies.
  1. nineko

    nineko I am the Holy Cat Member

    Joined:
    Mar 24, 2008
    Messages:
    1,902
    Location:
    italy
    You can do everything with flags. As for ROM size, layout files and object files don't take up much space (e.g. in Sonic 1 all the layout files sum up to 5772 bytes, and all the object files sum up to 25290 bytes). Most of the space is used by the art; if you use the same art for the different versions of the levels then you're not going to take up too much space. Even then, you're not likely to get anywhere near 4 megabytes, and even then, mappers exist.
     
  2. JoenickROS

    JoenickROS ROS (bug fixing in progress) Member

    Joined:
    Feb 5, 2012
    Messages:
    929
    Oh I wouldn't have to do what he did?
     
  3. nineko

    nineko I am the Holy Cat Member

    Joined:
    Mar 24, 2008
    Messages:
    1,902
    Location:
    italy
    If FLAG Then DO SOMETHING Else DO SOMETHING ELSE


    If SONIC Then LOAD SONIC LAYOUT Else LOAD KNUCKLES LAYOUT


    If EASY MODE Then LOAD EASY MODE Else LOAD HARD MODE


    etc.


    You check for flags, and you do different things. Of course if you want to load different layouts you have to duplicate the layout files and tweak the loading code accordingly. I do this, in a very inefficient way, in my hack, where I give 3 palette choices (original, mine, and Stephen's), I triplicated the palette files and made the palette loading code check for the palette flag:

    Code:
    PalLoad1:
    		cmpi.b	#$1,($FFFFC615).w
    		beq.s	PalLoad1Orig
    		cmpi.b	#$2,($FFFFC615).w
    		beq.s	PalLoad1Stephen
    		lea	(PalPointers).l,a1	; NEKO PALETTES
    		bra.s	PalLoad1Common
    PalLoad1Stephen:
    		lea	(PalPointers3).l,a1	; STEPHEN PALETTES
    		bra.s	PalLoad1Common
    PalLoad1Orig:
    		lea	(PalPointers2).l,a1	; ORIGINAL PALETTES
    PalLoad1Common:
    		lsl.w	#3,d0
    		adda.w	d0,a1
    		movea.l	(a1)+,a2
    		movea.w	(a1)+,a3
    		adda.w	#$80,a3
    		move.w	(a1)+,d7
    The basic concept is always the same. Set flag, cmpi.b, beq.b/bne.b :)
     
    Last edited by a moderator: May 22, 2013
  4. SuperEgg

    SuperEgg I'm a guy that knows that you know that I know Member

    Joined:
    Oct 17, 2009
    Messages:
    Location:
    THE BEST GOD DAMN STATE OF TEXAS
    This concept isn't that foreign. If you guys have Build A Burger, it's exactly the same thing, except for difficulty setting, it's just choosing layouts between each S2 build

    . As far as making it character specific, it's not that hard either, it's just a matter of checking which character is in play, and having it choose a layout, beginning path, or whatever. RHS, nineko, and I have posted alternate methods of doing so, of course RHS is more energy efficient. =P In theory, you could easily just make a table to load each layout, thus de-stressing the whole issue entirely, but I'm too lazy to post up an example.

    edit: Had to add in nineko, cause he post came in to late, cause he's gay and all =P
     
    Last edited by a moderator: May 22, 2013
  5. redhotsonic

    redhotsonic Also known as RHS Member

    Joined:
    Aug 10, 2007
    Messages:
    2,969
    Location:
    England
    Here's a much quicker way, and smaller in size too:

    Code:
    PalLoad1:
    		lea	(PalPointers).l,a1	; NEKO PALETTES
    		tst.b	($FFFFC615).w
    		beq.s	PalLoad1Common
    		lea	(PalPointers2).l,a1	; ORIGINAL PALETTES
    		cmpi.b	#$1,($FFFFC615).w
    		beq.s	PalLoad1Common
    		lea	(PalPointers3).l,a1	; STEPHEN PALETTES
    
    PalLoad1Common:
    		lsl.w	#3,d0
    		adda.w	d0,a1
    		movea.l	(a1)+,a2
    		movea.w	(a1)+,a3
    		adda.w	#$80,a3
    		move.w	(a1)+,d7
     
  6. nineko

    nineko I am the Holy Cat Member

    Joined:
    Mar 24, 2008
    Messages:
    1,902
    Location:
    italy
    Thanks, I'm quite bad at 68k ASM, most of the code I wrote is highly inefficient, I hope you'll never find out how I implemented the Feather Monitor and the Emerald Monitor :p
     
  7. Crash

    Crash Well-Known Member Member

    Joined:
    Jul 15, 2010
    Messages:
    302
    Location:
    Australia
    And if you've got a lot of different options, you're better off doing something like this to avoid a million cmpi/beq's:

    Code:
    PalPointerList:
    		dc.l PalPointers, PalPointers2, PalPointers3, PalPointers4, PalPointers5
    		dc.l PalPointers6, PalPointers7, PalPointers8, PalPointers9, PalPointers10
    ; --------------------------------------------------------------------------
    PalLoad1:
    		moveq #0,d1
    		move.b ($FFFFC615).w,d1            ; copy palette number to d1
    		add.w d1,d1
    		add.w d1,d1
    		movea.l PalPointerList(pc,d1.w),a1 ; load palette address from list
    
    		lsl.w #3,d0
    		adda.w d0,a1
    		movea.l (a1)+,a2
    		movea.w (a1)+,a3
    		adda.w #$80,a3
    		move.w (a1)+,d7
     
    Last edited by a moderator: May 23, 2013
  8. redhotsonic

    redhotsonic Also known as RHS Member

    Joined:
    Aug 10, 2007
    Messages:
    2,969
    Location:
    England
    (pc,d4.w),a1? I think you mean (pc,d1.w),a1 =P
     
  9. vladikcomper

    vladikcomper Well-Known Member Member

    Joined:
    Dec 2, 2009
    Messages:
    415
    You can do better, RHS. I've seen some of optimizations you suggested in other threads recently and unfortunately, you keep doing the same optimization mistakes. Not to say these are critical ones. And optimization isn't the most important thing after all. You're certainly thinking the right direction with the solutions you come up with and this particular example of yours isn't half bad. However, if you are so concerned in optimizing stuff a lot as well as helping people out with optimizations of yours, you should consider learning some basic optimization techniques widely used on 68K first, in order to do the job properly.


    In the given example, there's a lot you can do just by tweaking opcodes themselves here and there. This is what I consider to be the basic optimization. Here's how this can be optimized properly without changing the original code logic:


    PalLoad1:
    lea PalPointers,a1 ; NEKO PALETTES
    move.b ($FFFFC615).w,d7
    beq.s PalLoad1Common
    lea PalPointers2,a1 ; ORIGINAL PALETTES
    subq.b #1,d7
    beq.s PalLoad1Common
    lea PalPointers3,a1 ; STEPHEN PALETTES

    PalLoad1Common:
    lsl.w #3,d0
    adda.w d0,a1
    movea.l (a1)+,a2
    movea.w (a1)+,a3
    lea $80(a3),a3
    move.w (a1)+,d7

    Let's count the cycles each version takes:

    Old version:


    PalLoad1:
    lea (PalPointers).l,a1 ; 12
    tst.b ($FFFFC615).w ; 12
    beq.s PalLoad1Common ; 10/8
    lea (PalPointers2).l,a1 ; 12
    cmpi.b #$1,($FFFFC615).w ; 16
    beq.s PalLoad1Common ; 10/8
    lea (PalPointers3).l,a1 ; 12

    PalLoad1Common:
    lsl.w #3,d0 ; 12
    adda.w d0,a1 ; 8
    movea.l (a1)+,a2 ; 12
    movea.w (a1)+,a3 ; 8
    adda.w #$80,a3 ; 12
    move.w (a1)+,d7 ; 8

    New version:


    PalLoad1:
    lea PalPointers,a1 ; 8 (-4)
    move.b ($FFFFC615).w,d7 ; 12
    beq.s PalLoad1Common ; 10/8
    lea PalPointers2,a1 ; 8 (-4)
    subq.b #1,d7 ; 4 (-12)
    beq.s PalLoad1Common ; 10/8
    lea PalPointers3,a1 ; 8 (-4)

    PalLoad1Common:
    lsl.w #3,d0 ; 12
    adda.w d0,a1 ; 8
    movea.l (a1)+,a2 ; 12
    movea.w (a1)+,a3 ; 8
    lea $80(a3),a3 ; 8 (-4)
    move.w (a1)+,d7 ; 8

    The newer version saves up to 28 cycles. Pretty good number for a small code, isn't it?

    So, let's break this down:

    1. Optimizing memory accesses


    tst.b ($FFFFC615).w ; 12
    <...>
    cmpi.b #$1,($FFFFC615).w ; 16

    versus


    move.b ($FFFFC615).w,d7 ; 12
    <...>
    subq.b #1,d7 ; 4 (-12)

    Saves up to 12 cycles. This is quite a lot, actually, worth of the whole lea (xxx).l,an instruction.

    You see, MOVE.B costs the same number of cycles that TST.B, but by remembering memory value, you don't have to access the same address to retrieve the same value once more, and this saves a bunch of processing time.

    Well, you could just replace


    cmpi.b #$1,($FFFFC615).w ; 16

    with


    cmpi.b #1,d7 ; 8

    ... and that would save you 8 cycles already. But I went a bit further, by optimizing CMPI.B #1 to SUBQ.B #1. They both do virtually the same, except for CMPI doesn't actually store the result of subtraction in the destination operand. Since we won't need the value in D7 anymore, this will work greatly here.

    Remember the main rule, when optimizing on 68K: use registers at most. This always is times faster except for really rare cases (one being the case of reading a word to register from an odd address -- temporary writing each byte to a properly aligned memory address then reading the whole thing as the word works faster than writing hi byte to reg, shifting it 8 bits left, reading low byte. The Kosinski decompressor performs this optimization).

    2. Optimizing addressing modes


    lea (PalPointers).l,a1 ; 12

    versus


    lea PalPointers,a1 ; 8 (-4)

    The first example forces long absolute addressing mode on the source operand, which takes the longest time to calculate on 68K. Avoid using this addressing mode when possible. In a lot of cases, this is actually possible.

    In the second example, I'm not forcing any particular addressing mode, letting the assembler choose the most appropriate one for me. This is the recommended way of programming in assembly. None of professional ASM programmers used pre-defined addressing modes in this case. Look at the original Yuji Naka's coding: http://pastebin.com/L6W4CHxK (this is Sonic 2's Debug Mode code, known as Edit Mode in the original source code)

    Defining a particular addressing mode on each instruction is disassembler's way of laying things down. This is necessary to make sure disassembled code can be re-assembled saving each opcode's original form. We're humans, not assemblers - this is no need for us to write like that.

    So in the second example, I'm not forcing any addressing modes. I'm giving it 8 cycles though, expecting it that assembler will force pc-relative addressing mode here. If optimizations are enabled within your assembler, considering palette pointers data is located within 32 KB range from these instructions, assembler will certainly pick up that addressing mode, saving you 4 cycles per each LEA.

    3. Adding an immediate value to an address register


    adda.w #$80,a3 ; 12

    versus


    lea $80(a3),a3 ; 8 (-4)

    Using ADDA is not recommended, since there is a faster equivalent, that not only does the same thing, but also provides you more possibilities: you can store the result of addition an immediate value and any address register, not necessarily the same as destination operand.

    Always use LEA when you want to add an immediate number to an address register -- it's considerably faster. Except for the case you want to add values 1 to 8, use ADDQ instead.

    The LEA being faster involves 68K architecture. Apart from having a 16-bit arithmetic logic unit (ALU) to perform math operations, the 68K has two 16-bit Address Units (AUs) in its core, used to calculate effective addresses. The AUs work simultaneously, making it possible to perform 32-bit calculations within one machine cycle (address registers always involve 32-bit calculations, even if instruction has a .w prefix next to it).

    LEA, standing for Load Effective Address as you know, relies on AUs, while ADDA/SUBA rely on ALU to perform math. The ALU, being a 16-bit unit, can't process 32-bit operations during one machine cycle, it needs one extra machine cycle to do the high word.
     
  10. MarkeyJester

    MarkeyJester ♡ ! Member

    Joined:
    Jun 27, 2009
    Messages:
    2,867
    Oh my holy shit, we're not going into optimisation philosophy again, are we?  The guy just wants to understand the simple principle behind boolean or multiple decision making, not the quickest or smallest way to do it.

    Sometimes the quickest or smallest ways are the most complex and are likely to confuse the inexperienced rather than help, come on guys, you need to start them off low.
     
  11. JoenickROS

    JoenickROS ROS (bug fixing in progress) Member

    Joined:
    Feb 5, 2012
    Messages:
    929
    Yeah my head just exploded lol


    Edit: but I should come to understand this some time in the future, since I will try to get into computer sciences this year, for college.


    Edit2: Instead of Criminal justice. Why I wanted to get Into that in the first place is a mystery. lol
     
    Last edited by a moderator: May 22, 2013
  12. nineko

    nineko I am the Holy Cat Member

    Joined:
    Mar 24, 2008
    Messages:
    1,902
    Location:
    italy
    I like how some of the lines optimised by vladikcomper were part of the original Sonic 1 code and not code written by me (I pasted the whole subroutine, but the "common" section in my example was unchanged). Which further proves that vladikcomper is better than SEGA itself :U
     
  13. redhotsonic

    redhotsonic Also known as RHS Member

    Joined:
    Aug 10, 2007
    Messages:
    2,969
    Location:
    England
    @vladikcomper: Woah, where did all that come from? =P

    I knew one or two ways to make it better, but you blew that out of the water! I didn't suggest anything better because it's only a 1 time use. Although I might use your information on other stuff that I might want to improve =P
     
  14. Crash

    Crash Well-Known Member Member

    Joined:
    Jul 15, 2010
    Messages:
    302
    Location:
    Australia
    Whoops, fixed :p


    Isn't this code is only ever run one time when loading a level? There's not all that much point in optimising it. Useful info anyway, I never realised adda was worse than lea!
     
    Last edited by a moderator: May 23, 2013
  15. SuperEgg

    SuperEgg I'm a guy that knows that you know that I know Member

    Joined:
    Oct 17, 2009
    Messages:
    Location:
    THE BEST GOD DAMN STATE OF TEXAS
    To be frank, optimization isn't the biggest fish to fry. Is it important? Sure, but should it be a main concern? No.

    To me, if the code works, it works. Cleaning it up is great, but trying to introduce somebody new to something like this, they don't need to know optimizations. There are three or four different ways to work this example. I posted my example because it is simple, straight forward, and not to mention can show how basic checks are implemented.

    Which is easier to understand.

    "If this equal this, go here. If it equals this, go here." and so forth.

    As opposed to..

    "Go here to begin with. BUT, test this. If it is anything but A go here. Also, go ahead and test it again, but this time add 1 to the equation, then go here." 

    I know this isn't verbatim what your example means, but knowing the language, it kinda does. Is it more efficient, I suppose, would it truly make a difference? Once again, not really. When you play BAB and change up what level options you want, is there any lag? Maybe a millisecond, but the average hacker and or player? Not really. Of course this is more of a discussion of how to code and personal preference, I just thought it'd be better to introduce the beginners to the straight forward approach.
     
Thread Status:
Not open for further replies.