Sonic CD Gems Collection Linker Disassembly Dumps

Discussion in 'Discussion & Q&A' started by ⸸ devon ⸸, Jun 12, 2022.

  1. ⸸ devon ⸸

    ⸸ devon ⸸ There's nothing left but faith Member

    Joined:
    Aug 26, 2013
    Messages:
    1,174
    So, I've been digging at those ELF files for the Gems Collection version of Sonic CD. While objdump could be used to extract function and variable names, I noticed that it wasn't able to parse the .debug sections. The thing is that, there were definitely more symbols to extract, such as structure member names.

    Today, I did some more digging, and found this thread, which actually talked about my very issue, but for DDRMAX2. The very last post then took me to this post, which showed that someone successfully managed to extract data from the .debug section, and provided a batch script on how they did it.

    Gems Collection so happened to have been compiled and linked with the same tools as DDRMAX2, and I found an archive of those tools on archive.org, with cracks. So, I installed them and ran the Sonic CD ELF files from the PS2 version through the linker's disassembler, and...

    It worked.

    It's pure beauty. A much more complete dump of symbols with disassembled code from Sonic CD is now available.
     
  2. ⸸ devon ⸸

    ⸸ devon ⸸ There's nothing left but faith Member

    Joined:
    Aug 26, 2013
    Messages:
    1,174
    So, upon digging through this more after finding the time to do so... not only are there more symbols that can be found than what was initially found, but also every function's arguments, their local variables, the structures used, and other information attached to them, like their types, and all kinds of other stuff that was used for debugging.

    Here's an excerpt:
    Code:
    00011781:<116>TAG_compile_unit
    00011787    AT_sibling(000118f9)
    0001178d    AT_low_pc(010075e0)
    00011793    AT_high_pc(010077a0)
    00011799    AT_stmt_list(00003e18)
    0001179f    AT_language(LANG_C)
    000117a5    AT_producer(MW MIPS C Compiler)
    000117ba    AT_name(C:\project\GEMS\application\SonicCD\src\ps2\main\ENEMY.C)
    000117f5:<107>TAG_global_subroutine
    000117fb    AT_sibling(000118f5)
    00011801    AT_low_pc(010075e0)
    00011807    AT_high_pc(010077a0)
    0001180d    AT_fund_type(FT_void)
    00011811    AT_global_refs_block(<8>8f fc 00 00 b2 fc 00 00 )
    0001181d    AT_restore_S0(<6> OP_BASEREG(29) OP_DREF8)
    00011827    AT_restore_S1(<12> OP_BASEREG(29) OP_CONST(16) OP_ADD OP_DREF8)
    00011837    AT_return_addr(<12> OP_BASEREG(29) OP_CONST(32) OP_ADD OP_DREF8)
    00011847    AT_restore_SP(<11> OP_REG(29) OP_CONST(64) OP_ADD)
    00011856    AT_name(ka_move)
    00011860:<45>TAG_formal_parameter
    00011866    AT_sibling(0001188d)
    0001186c    AT_mod_u_d_type(<5>MOD_pointer_to (0000fcd8))
    00011875    AT_location(<11> OP_BASEREG(29) OP_CONST(48) OP_ADD)
    00011884    AT_name(pActwk)
    0001188d:<24>TAG_lexical_block
    00011893    AT_sibling(000118f1)
    00011899    AT_low_pc(010075e0)
    0001189f    AT_high_pc(010077a0)
    000118a5:<42>TAG_local_variable
    000118ab    AT_sibling(000118cf)
    000118b1    AT_mod_u_d_type(<5>MOD_pointer_to (0000fcd8))
    000118ba    AT_location(<5> OP_REG(17))
    000118c3    AT_name(pPlayerwk)
    000118cf:<30>TAG_local_variable
    000118d5    AT_sibling(000118ed)
    000118db    AT_fund_type(FT_signed_short)
    000118df    AT_location(<5> OP_REG(16))
    000118e8    AT_name(d0)
    The numbers on the left are basically the "location" of the information listed on that line. "AT_fund_type" and "AT_mod_u_d_type" indicate a type of variable. "MOD_pointer_to" means that it's a pointer, and they can be repeated to indicate the number of layers (i.e. "MOD_pointer_to MOD_pointer_to FT_char" is the same as char**). Any time that has a hex number instead is pointer to one of those location numbers found on the left, and that will give you the actual type info. "AT_name" is the symbol name, of course. In the "TAG_compile_unit" section, the name is the full path name of the source file that the subsequent sections were compiled from. Unfortunately, it seems that structure names were not kept...? They're just labelled as "anonX". I wonder if I can write myself a quick tool to convert all of this info into something more legible... or maybe something already exists, considering this is definitely DWARF.

    EDIT: It's DWARF v1... apparently that's why there's been some trouble, because some tools don't even support it.
    EDIT 2: Found a tool that actually does what I wanted to do a bit. Will come back with dumps soon-ish.
     
    Last edited: Sep 17, 2022
    EpsilionDubwool likes this.
  3. ⸸ devon ⸸

    ⸸ devon ⸸ There's nothing left but faith Member

    Joined:
    Aug 26, 2013
    Messages:
    1,174
    Double posting, because I think it's warranted. As I said in the previous post, I found a tool called dwarf2cpp, that parses DWARF v1 data and generates C/C++ skeletons from them, basically allowing for easier analysis of variables, structures, and function prototypes and their local variables, while also setting up the folder structure of the source code. No actual code is decompiled, it just dumps those things. Should be a good resource for a possible decompilation in the future maybe?

    Download
    GitHub Repository

    Some samples of what it generated:
    [​IMG]
    [​IMG]
    [​IMG]
    [​IMG]

    Here's a sample decompilation I did with this information (note: structure names and constant names had to be made up):
    Code:
    void action(void) {
        actwkt *pActwk;
        i32 i;
    
        pActwk = actwk;
        for (i = 0; i < ACTWK_SLOTS; ++i) {
            if (pActwk->actno != 0) {
                act_tbl[pActwk->actno](pActwk);
            }
            ++pActwk;
        }
    }
    
    void speedset(actwkt *pActwk) {
        i32u xpos;
        i32u ypos;
        i16u spd;
    
        ypos = pActwk->yposi;
        xpos = pActwk->xposi;
        spd = pActwk->xspeed;
        xpos.l += (spd.w << 8);
    
        spd = pActwk->yspeed;
        if (!(pActwk->actfree[PLAYCTRL] & 8)) {
            if (spd.w >= 0 ||
                (!(pActwk->actfree[PLAYCTRL] & 2) ||
                spd.w >= -0x800)) {
                if (!(pActwk->actfree[PLAYCTRL] & 4)) {
                    pActwk->yspeed.w += 0x38;
                }
            }
        }
        if (pActwk->yspeed.w >= 0) {
            if (pActwk->yspeed.w >= 0x1000) {
                pActwk->yspeed.w = 0x1000;
            }
        }
        ypos.l += spd.w << 8;
    
        pActwk->xposi.l = xpos.l;
        pActwk->yposi.l = ypos.l;
    }
    
    void speedset2(actwkt *pActwk) {
        i32u xpos;
        i32u ypos;
        i32 spd;
        i32 actwkno;
        i16 d1;
    
        xpos = pActwk->xposi;
        ypos = pActwk->yposi;
    
        spd = pActwk->xspeed.w;
        if (pActwk->cddat & 8) {
            actwkno = pActwk->actfree[PLAYRIDE];
            if (actwk[actwkno].actno == 0x1E) {
                d1 = -0x100;
                if (!(pActwk->cddat & 1)) {
                    d1 = -d1;
                }
                spd += d1;
            }
        }
        spd <<= 8;
        xpos.l += spd;
    
        spd = pActwk->yspeed.w;
        spd <<= 8;
        ypos.l += spd;
        pActwk->xposi = xpos;
        pActwk->yposi = ypos;
    }
     
    Last edited: Sep 17, 2022