So, I've been digging at those ELF files for the Gems Collection version of Sonic CD. While objdump could be used to extract function and variable names, I noticed that it wasn't able to parse the .debug sections. The thing is that, there were definitely more symbols to extract, such as structure member names. Today, I did some more digging, and found this thread, which actually talked about my very issue, but for DDRMAX2. The very last post then took me to this post, which showed that someone successfully managed to extract data from the .debug section, and provided a batch script on how they did it. Gems Collection so happened to have been compiled and linked with the same tools as DDRMAX2, and I found an archive of those tools on archive.org, with cracks. So, I installed them and ran the Sonic CD ELF files from the PS2 version through the linker's disassembler, and... It worked. It's pure beauty. A much more complete dump of symbols with disassembled code from Sonic CD is now available.
So, upon digging through this more after finding the time to do so... not only are there more symbols that can be found than what was initially found, but also every function's arguments, their local variables, the structures used, and other information attached to them, like their types, and all kinds of other stuff that was used for debugging. Here's an excerpt: Code: 00011781:<116>TAG_compile_unit 00011787 AT_sibling(000118f9) 0001178d AT_low_pc(010075e0) 00011793 AT_high_pc(010077a0) 00011799 AT_stmt_list(00003e18) 0001179f AT_language(LANG_C) 000117a5 AT_producer(MW MIPS C Compiler) 000117ba AT_name(C:\project\GEMS\application\SonicCD\src\ps2\main\ENEMY.C) 000117f5:<107>TAG_global_subroutine 000117fb AT_sibling(000118f5) 00011801 AT_low_pc(010075e0) 00011807 AT_high_pc(010077a0) 0001180d AT_fund_type(FT_void) 00011811 AT_global_refs_block(<8>8f fc 00 00 b2 fc 00 00 ) 0001181d AT_restore_S0(<6> OP_BASEREG(29) OP_DREF8) 00011827 AT_restore_S1(<12> OP_BASEREG(29) OP_CONST(16) OP_ADD OP_DREF8) 00011837 AT_return_addr(<12> OP_BASEREG(29) OP_CONST(32) OP_ADD OP_DREF8) 00011847 AT_restore_SP(<11> OP_REG(29) OP_CONST(64) OP_ADD) 00011856 AT_name(ka_move) 00011860:<45>TAG_formal_parameter 00011866 AT_sibling(0001188d) 0001186c AT_mod_u_d_type(<5>MOD_pointer_to (0000fcd8)) 00011875 AT_location(<11> OP_BASEREG(29) OP_CONST(48) OP_ADD) 00011884 AT_name(pActwk) 0001188d:<24>TAG_lexical_block 00011893 AT_sibling(000118f1) 00011899 AT_low_pc(010075e0) 0001189f AT_high_pc(010077a0) 000118a5:<42>TAG_local_variable 000118ab AT_sibling(000118cf) 000118b1 AT_mod_u_d_type(<5>MOD_pointer_to (0000fcd8)) 000118ba AT_location(<5> OP_REG(17)) 000118c3 AT_name(pPlayerwk) 000118cf:<30>TAG_local_variable 000118d5 AT_sibling(000118ed) 000118db AT_fund_type(FT_signed_short) 000118df AT_location(<5> OP_REG(16)) 000118e8 AT_name(d0) The numbers on the left are basically the "location" of the information listed on that line. "AT_fund_type" and "AT_mod_u_d_type" indicate a type of variable. "MOD_pointer_to" means that it's a pointer, and they can be repeated to indicate the number of layers (i.e. "MOD_pointer_to MOD_pointer_to FT_char" is the same as char**). Any time that has a hex number instead is pointer to one of those location numbers found on the left, and that will give you the actual type info. "AT_name" is the symbol name, of course. In the "TAG_compile_unit" section, the name is the full path name of the source file that the subsequent sections were compiled from. Unfortunately, it seems that structure names were not kept...? They're just labelled as "anonX". I wonder if I can write myself a quick tool to convert all of this info into something more legible... or maybe something already exists, considering this is definitely DWARF. EDIT: It's DWARF v1... apparently that's why there's been some trouble, because some tools don't even support it. EDIT 2: Found a tool that actually does what I wanted to do a bit. Will come back with dumps soon-ish.
Double posting, because I think it's warranted. As I said in the previous post, I found a tool called dwarf2cpp, that parses DWARF v1 data and generates C/C++ skeletons from them, basically allowing for easier analysis of variables, structures, and function prototypes and their local variables, while also setting up the folder structure of the source code. No actual code is decompiled, it just dumps those things. Should be a good resource for a possible decompilation in the future maybe? Download GitHub Repository Some samples of what it generated: Spoiler Here's a sample decompilation I did with this information (note: structure names and constant names had to be made up): Code: void action(void) { actwkt *pActwk; i32 i; pActwk = actwk; for (i = 0; i < ACTWK_SLOTS; ++i) { if (pActwk->actno != 0) { act_tbl[pActwk->actno](pActwk); } ++pActwk; } } void speedset(actwkt *pActwk) { i32u xpos; i32u ypos; i16u spd; ypos = pActwk->yposi; xpos = pActwk->xposi; spd = pActwk->xspeed; xpos.l += (spd.w << 8); spd = pActwk->yspeed; if (!(pActwk->actfree[PLAYCTRL] & 8)) { if (spd.w >= 0 || (!(pActwk->actfree[PLAYCTRL] & 2) || spd.w >= -0x800)) { if (!(pActwk->actfree[PLAYCTRL] & 4)) { pActwk->yspeed.w += 0x38; } } } if (pActwk->yspeed.w >= 0) { if (pActwk->yspeed.w >= 0x1000) { pActwk->yspeed.w = 0x1000; } } ypos.l += spd.w << 8; pActwk->xposi.l = xpos.l; pActwk->yposi.l = ypos.l; } void speedset2(actwkt *pActwk) { i32u xpos; i32u ypos; i32 spd; i32 actwkno; i16 d1; xpos = pActwk->xposi; ypos = pActwk->yposi; spd = pActwk->xspeed.w; if (pActwk->cddat & 8) { actwkno = pActwk->actfree[PLAYRIDE]; if (actwk[actwkno].actno == 0x1E) { d1 = -0x100; if (!(pActwk->cddat & 1)) { d1 = -d1; } spd += d1; } } spd <<= 8; xpos.l += spd; spd = pActwk->yspeed.w; spd <<= 8; ypos.l += spd; pActwk->xposi = xpos; pActwk->yposi = ypos; }
I would like to bring attention to a decompilation of the Gems Collection version of Sonic CD that someone by the name of BenoitRen has started based on the extracted debug information and skeleton source repository I generated from dwarf2cpp (link to the actual files). The root directory has been decompiled so far.