Adding difficulty modes

nineko · May 22, 2013

You can do everything with flags. As for ROM size, layout files and object files don't take up much space (e.g. in Sonic 1 all the layout files sum up to 5772 bytes, and all the object files sum up to 25290 bytes). Most of the space is used by the art; if you use the same art for the different versions of the levels then you're not going to take up too much space. Even then, you're not likely to get anywhere near 4 megabytes, and even then, mappers exist.

JoenickROS · May 22, 2013

nineko said: ↑

You can do everything with flags.
Click to expand...

Oh I wouldn't have to do what he did?

nineko · May 22, 2013

If FLAG Then DO SOMETHING Else DO SOMETHING ELSE

If SONIC Then LOAD SONIC LAYOUT Else LOAD KNUCKLES LAYOUT

If EASY MODE Then LOAD EASY MODE Else LOAD HARD MODE

etc.

You check for flags, and you do different things. Of course if you want to load different layouts you have to duplicate the layout files and tweak the loading code accordingly. I do this, in a very inefficient way, in my hack, where I give 3 palette choices (original, mine, and Stephen's), I triplicated the palette files and made the palette loading code check for the palette flag:
Code:
PalLoad1:
		cmpi.b	#$1,($FFFFC615).w
		beq.s	PalLoad1Orig
		cmpi.b	#$2,($FFFFC615).w
		beq.s	PalLoad1Stephen
		lea	(PalPointers).l,a1	; NEKO PALETTES
		bra.s	PalLoad1Common
PalLoad1Stephen:
		lea	(PalPointers3).l,a1	; STEPHEN PALETTES
		bra.s	PalLoad1Common
PalLoad1Orig:
		lea	(PalPointers2).l,a1	; ORIGINAL PALETTES
PalLoad1Common:
		lsl.w	#3,d0
		adda.w	d0,a1
		movea.l	(a1)+,a2
		movea.w	(a1)+,a3
		adda.w	#$80,a3
		move.w	(a1)+,d7
The basic concept is always the same. Set flag, cmpi.b, beq.b/bne.b

SuperEgg · May 22, 2013

This concept isn't that foreign. If you guys have Build A Burger, it's exactly the same thing, except for difficulty setting, it's just choosing layouts between each S2 build

. As far as making it character specific, it's not that hard either, it's just a matter of checking which character is in play, and having it choose a layout, beginning path, or whatever. RHS, nineko, and I have posted alternate methods of doing so, of course RHS is more energy efficient. =P In theory, you could easily just make a table to load each layout, thus de-stressing the whole issue entirely, but I'm too lazy to post up an example.

edit: Had to add in nineko, cause he post came in to late, cause he's gay and all =P

redhotsonic · May 22, 2013

nineko said: ↑

PalLoad1:
cmpi.b #$1,($FFFFC615).w
beq.s PalLoad1Orig
cmpi.b #$2,($FFFFC615).w
beq.s PalLoad1Stephen
lea (PalPointers).l,a1 ; NEKO PALETTES
bra.s PalLoad1Common
PalLoad1Stephen:
lea (PalPointers3).l,a1 ; STEPHEN PALETTES
bra.s PalLoad1Common
PalLoad1Orig:
lea (PalPointers2).l,a1 ; ORIGINAL PALETTES
PalLoad1Common:
lsl.w #3,d0
adda.w d0,a1
movea.l (a1)+,a2
movea.w (a1)+,a3
adda.w #$80,a3
move.w (a1)+,d7
Click to expand...

Here's a much quicker way, and smaller in size too:
Code:
PalLoad1:
		lea	(PalPointers).l,a1	; NEKO PALETTES
		tst.b	($FFFFC615).w
		beq.s	PalLoad1Common
		lea	(PalPointers2).l,a1	; ORIGINAL PALETTES
		cmpi.b	#$1,($FFFFC615).w
		beq.s	PalLoad1Common
		lea	(PalPointers3).l,a1	; STEPHEN PALETTES

PalLoad1Common:
		lsl.w	#3,d0
		adda.w	d0,a1
		movea.l	(a1)+,a2
		movea.w	(a1)+,a3
		adda.w	#$80,a3
		move.w	(a1)+,d7

nineko · May 22, 2013

Thanks, I'm quite bad at 68k ASM, most of the code I wrote is highly inefficient, I hope you'll never find out how I implemented the Feather Monitor and the Emerald Monitor

Crash · May 23, 2013

And if you've got a lot of different options, you're better off doing something like this to avoid a million cmpi/beq's:

Code:

PalPointerList:
		dc.l PalPointers, PalPointers2, PalPointers3, PalPointers4, PalPointers5
		dc.l PalPointers6, PalPointers7, PalPointers8, PalPointers9, PalPointers10
; --------------------------------------------------------------------------
PalLoad1:
		moveq #0,d1
		move.b ($FFFFC615).w,d1            ; copy palette number to d1
		add.w d1,d1
		add.w d1,d1
		movea.l PalPointerList(pc,d1.w),a1 ; load palette address from list

		lsl.w #3,d0
		adda.w d0,a1
		movea.l (a1)+,a2
		movea.w (a1)+,a3
		adda.w #$80,a3
		move.w (a1)+,d7

redhotsonic · May 22, 2013

(pc,d4.w),a1? I think you mean (pc,d1.w),a1 =P

vladikcomper · May 22, 2013

redhotsonic said: ↑

Here's a much quicker way, and smaller in size too:

PalLoad1:
lea (PalPointers).l,a1 ; NEKO PALETTES
tst.b ($FFFFC615).w
beq.s PalLoad1Common
lea (PalPointers2).l,a1 ; ORIGINAL PALETTES
cmpi.b #$1,($FFFFC615).w
beq.s PalLoad1Common
lea (PalPointers3).l,a1 ; STEPHEN PALETTES

PalLoad1Common:
lsl.w #3,d0
adda.w d0,a1
movea.l (a1)+,a2
movea.w (a1)+,a3
adda.w #$80,a3
move.w (a1)+,d7
Click to expand...

You can do better, RHS. I've seen some of optimizations you suggested in other threads recently and unfortunately, you keep doing the same optimization mistakes. Not to say these are critical ones. And optimization isn't the most important thing after all. You're certainly thinking the right direction with the solutions you come up with and this particular example of yours isn't half bad. However, if you are so concerned in optimizing stuff a lot as well as helping people out with optimizations of yours, you should consider learning some basic optimization techniques widely used on 68K first, in order to do the job properly.

In the given example, there's a lot you can do just by tweaking opcodes themselves here and there. This is what I consider to be the basic optimization. Here's how this can be optimized properly without changing the original code logic:

PalLoad1:
lea PalPointers,a1 ; NEKO PALETTES
move.b ($FFFFC615).w,d7
beq.s PalLoad1Common
lea PalPointers2,a1 ; ORIGINAL PALETTES
subq.b #1,d7
beq.s PalLoad1Common
lea PalPointers3,a1 ; STEPHEN PALETTES

PalLoad1Common:
lsl.w #3,d0
adda.w d0,a1
movea.l (a1)+,a2
movea.w (a1)+,a3
lea $80(a3),a3
move.w (a1)+,d7

Let's count the cycles each version takes:

Old version:

PalLoad1:
lea (PalPointers).l,a1 ; 12
tst.b ($FFFFC615).w ; 12
beq.s PalLoad1Common ; 10/8
lea (PalPointers2).l,a1 ; 12
cmpi.b #$1,($FFFFC615).w ; 16
beq.s PalLoad1Common ; 10/8
lea (PalPointers3).l,a1 ; 12

PalLoad1Common:
lsl.w #3,d0 ; 12
adda.w d0,a1 ; 8
movea.l (a1)+,a2 ; 12
movea.w (a1)+,a3 ; 8
adda.w #$80,a3 ; 12
move.w (a1)+,d7 ; 8

New version:

PalLoad1:
lea PalPointers,a1 ; 8 (-4)
move.b ($FFFFC615).w,d7 ; 12
beq.s PalLoad1Common ; 10/8
lea PalPointers2,a1 ; 8 (-4)
subq.b #1,d7 ; 4 (-12)
beq.s PalLoad1Common ; 10/8
lea PalPointers3,a1 ; 8 (-4)

PalLoad1Common:
lsl.w #3,d0 ; 12
adda.w d0,a1 ; 8
movea.l (a1)+,a2 ; 12
movea.w (a1)+,a3 ; 8
lea $80(a3),a3 ; 8 (-4)
move.w (a1)+,d7 ; 8

The newer version saves up to 28 cycles. Pretty good number for a small code, isn't it?

So, let's break this down:

1. Optimizing memory accesses

tst.b ($FFFFC615).w ; 12
<...>
cmpi.b #$1,($FFFFC615).w ; 16

versus

move.b ($FFFFC615).w,d7 ; 12
<...>
subq.b #1,d7 ; 4 (-12)

Saves up to 12 cycles. This is quite a lot, actually, worth of the whole lea (xxx).l,an instruction.

You see, MOVE.B costs the same number of cycles that TST.B, but by remembering memory value, you don't have to access the same address to retrieve the same value once more, and this saves a bunch of processing time.

Well, you could just replace

cmpi.b #$1,($FFFFC615).w ; 16

with

cmpi.b #1,d7 ; 8

... and that would save you 8 cycles already. But I went a bit further, by optimizing CMPI.B #1 to SUBQ.B #1. They both do virtually the same, except for CMPI doesn't actually store the result of subtraction in the destination operand. Since we won't need the value in D7 anymore, this will work greatly here.

Remember the main rule, when optimizing on 68K: use registers at most. This always is times faster except for really rare cases (one being the case of reading a word to register from an odd address -- temporary writing each byte to a properly aligned memory address then reading the whole thing as the word works faster than writing hi byte to reg, shifting it 8 bits left, reading low byte. The Kosinski decompressor performs this optimization).

2. Optimizing addressing modes

lea (PalPointers).l,a1 ; 12

versus

lea PalPointers,a1 ; 8 (-4)

The first example forces long absolute addressing mode on the source operand, which takes the longest time to calculate on 68K. Avoid using this addressing mode when possible. In a lot of cases, this is actually possible.

In the second example, I'm not forcing any particular addressing mode, letting the assembler choose the most appropriate one for me. This is the recommended way of programming in assembly. None of professional ASM programmers used pre-defined addressing modes in this case. Look at the original Yuji Naka's coding: http://pastebin.com/L6W4CHxK (this is Sonic 2's Debug Mode code, known as Edit Mode in the original source code)

Defining a particular addressing mode on each instruction is disassembler's way of laying things down. This is necessary to make sure disassembled code can be re-assembled saving each opcode's original form. We're humans, not assemblers - this is no need for us to write like that.

So in the second example, I'm not forcing any addressing modes. I'm giving it 8 cycles though, expecting it that assembler will force pc-relative addressing mode here. If optimizations are enabled within your assembler, considering palette pointers data is located within 32 KB range from these instructions, assembler will certainly pick up that addressing mode, saving you 4 cycles per each LEA.

3. Adding an immediate value to an address register

adda.w #$80,a3 ; 12

versus

lea $80(a3),a3 ; 8 (-4)

Using ADDA is not recommended, since there is a faster equivalent, that not only does the same thing, but also provides you more possibilities: you can store the result of addition an immediate value and any address register, not necessarily the same as destination operand.

Always use LEA when you want to add an immediate number to an address register -- it's considerably faster. Except for the case you want to add values 1 to 8, use ADDQ instead.

The LEA being faster involves 68K architecture. Apart from having a 16-bit arithmetic logic unit (ALU) to perform math operations, the 68K has two 16-bit Address Units (AUs) in its core, used to calculate effective addresses. The AUs work simultaneously, making it possible to perform 32-bit calculations within one machine cycle (address registers always involve 32-bit calculations, even if instruction has a .w prefix next to it).

LEA, standing for Load Effective Address as you know, relies on AUs, while ADDA/SUBA rely on ALU to perform math. The ALU, being a 16-bit unit, can't process 32-bit operations during one machine cycle, it needs one extra machine cycle to do the high word.

MarkeyJester · May 22, 2013

Oh my holy shit, we're not going into optimisation philosophy again, are we? The guy just wants to understand the simple principle behind boolean or multiple decision making, not the quickest or smallest way to do it.

Sometimes the quickest or smallest ways are the most complex and are likely to confuse the inexperienced rather than help, come on guys, you need to start them off low.

JoenickROS · May 22, 2013

Yeah my head just exploded lol

Edit: but I should come to understand this some time in the future, since I will try to get into computer sciences this year, for college.

Edit2: Instead of Criminal justice. Why I wanted to get Into that in the first place is a mystery. lol

nineko · May 22, 2013

I like how some of the lines optimised by vladikcomper were part of the original Sonic 1 code and not code written by me (I pasted the whole subroutine, but the "common" section in my example was unchanged). Which further proves that vladikcomper is better than SEGA itself :U

redhotsonic · May 22, 2013

@vladikcomper: Woah, where did all that come from? =P

I knew one or two ways to make it better, but you blew that out of the water! I didn't suggest anything better because it's only a 1 time use. Although I might use your information on other stuff that I might want to improve =P

Crash · May 23, 2013

redhotsonic said: ↑

(pc,d4.w),a1? I think you mean (pc,d1.w),a1 =P
Click to expand...

Whoops, fixed

Isn't this code is only ever run one time when loading a level? There's not all that much point in optimising it. Useful info anyway, I never realised adda was worse than lea!

SuperEgg · May 23, 2013

To be frank, optimization isn't the biggest fish to fry. Is it important? Sure, but should it be a main concern? No.

To me, if the code works, it works. Cleaning it up is great, but trying to introduce somebody new to something like this, they don't need to know optimizations. There are three or four different ways to work this example. I posted my example because it is simple, straight forward, and not to mention can show how basic checks are implemented.

Which is easier to understand.

"If this equal this, go here. If it equals this, go here." and so forth.

As opposed to..

"Go here to begin with. BUT, test this. If it is anything but A go here. Also, go ahead and test it again, but this time add 1 to the equation, then go here."

I know this isn't verbatim what your example means, but knowing the language, it kinda does. Is it more efficient, I suppose, would it truly make a difference? Once again, not really. When you play BAB and change up what level options you want, is there any lag? Maybe a millisecond, but the average hacker and or player? Not really. Of course this is more of a discussion of how to code and personal preference, I just thought it'd be better to introduce the beginners to the straight forward approach.

Log in or Sign up

Adding difficulty modes

nineko I am the Holy Cat Member

JoenickROS ROS (bug fixing in progress) Member

nineko I am the Holy Cat Member

SuperEgg I'm a guy that knows that you know that I know Staff

redhotsonic Also known as RHS Member

nineko I am the Holy Cat Member

Crash Well-Known Member Member

redhotsonic Also known as RHS Member

vladikcomper Well-Known Member Member

MarkeyJester ♡ ! Member

JoenickROS ROS (bug fixing in progress) Member

nineko I am the Holy Cat Member

redhotsonic Also known as RHS Member

Crash Well-Known Member Member

SuperEgg I'm a guy that knows that you know that I know Staff

Log in or Sign up

Adding difficulty modes

nineko I am the Holy Cat Member

JoenickROS ROS (bug fixing in progress) Member

nineko I am the Holy Cat Member

SuperEgg I'm a guy that knows that you know that I know Staff

redhotsonic Also known as RHS Member

nineko I am the Holy Cat Member

Crash Well-Known Member Member

redhotsonic Also known as RHS Member

vladikcomper Well-Known Member Member

MarkeyJester ♡ ! Member

JoenickROS ROS (bug fixing in progress) Member

nineko I am the Holy Cat Member

redhotsonic Also known as RHS Member

Crash Well-Known Member Member

SuperEgg I'm a guy that knows that you know that I know Staff

Useful Searches