Konami Compression tools

Discussion in 'Utilities' started by vladikcomper, May 4, 2020.

  1. vladikcomper

    vladikcomper Well-Known Member Member

    Joined:
    Dec 2, 2009
    Messages:
    389
    Location:
    Russia
    tldr; Just another compression format completely cracked, and a fairly good one. So, one more option to potentially pop into your hack or homebrew project, just because~
    This post releases my own cross-platform compressor & decompressor for the Konami compression format as well as the original decompression code for the M68K.



    Oh my, I haven't released anything in a long while! This one isn't anything big though, just a small compression tool for yet another format. I've actually spent some time thinking whether it's even worth releasing. But if some of you guys find it even a tiny bit useful, I'd be more than happy.
    Moreover, I'd be glad to reveal even more research on various Genesis games and their compression formats, but only if this particular release picks a certain degree of interest (I have a bunch of old stuff lying around, but after all it'd require quite a lot of time and effort to get everything polished and sorted~)


    How did this happen? (A short historical notice)
    So, back in 2013, by someone's request, I did a quick disassembly of Contra Hard Corps, a Genesis hit developed by Konami in 1994. The main goal was to figure out basic art and mappings editing. It wasn't long before I discovered samples of the compressed data in ROM and the decompressor code itself. I then quickly cracked and documented the format itself. All that was left to do was to develop a good compressor for it. It wasn't easy for me 7 years ago, but I did wrote my own compressor and it did... well, compress. The compression ratio was sligtly worse than that of the original compressor by Konami, but it did the job needed to get art swapping to work.

    Back then I decided not to release it anywhere (while I could), because, frankly, I thought it wasn't any good.

    Things suddenly changed a few weeks ago, when I decided to revisit some of my older tools and attempted to compile them under GNU/Linux (has been primary OS for quite some time). A spark suddenly hit me, as I decided to improve the code style: the mystery of underperformant compression still sat at the back of my mind after all those years. I then attempted my investigation of which part of my original implementation was responsible for the problem, so I could improve it.

    And that brings us here. As you might've guessed, I completely overhauled my algorithm and wrote the compressor from scratch. To my absolute surprise, this time the new compressor had a better compression ratio than the Konami's original. So, with the job well done, I thought this implementation may at least be worth some attention~

    Funny thing is, just before I went ahead with releasing it, I had found out that the same format had already been cracked a few years after my original research from 2013. I found another compression toolset released in 2015 on romhacking.net by Lab313, and it turned out Konami eventually came up with other variations of the this format for its later games (which were also covered by those tools). However, this toolset has Windows frontend only and I haven't found any actual documentation on the formats publicly available (which is the gap, I'm going to fill now). While that toolset is more likely aimed for art hacking Konami games, my release mainly targets developers and researchers.

    For the sake of simplicity and compatibility, I decided to re-use the previously established name of the format, LZKN1 (as was defined in the aforementioned toolset; my implementation originally refered to it just as "Konami compression").


    How good the format actually is? (Compression methods comparison)

    I believe, the first thing you might ask when presented with a new format is "how well does it compare to our favorite Kosinski and Nemesis?"

    To answer your question, I performed a quick comparison by uncompressing Sonic 2 level art from the stock disassembly (the art used Kosinski compression originally), and tried recompressing it into Nemesis and Kosinski format using the KENSC, as well as to Konami compression format using my own compressor (lzkn):

    [​IMG]

    NOTICE: The original Kosinski compressor Sonic Team used performed worse than KENSC, hence recompressing the same art results in ~1.8% of ROM space saved.

    As you may notice, the Konami Compression, LZKN1, comes pretty close compression-ratio wise. And Nemesis art, even when compressed by the best compressor to date (the original compressor Sonic Team used did way worse) is clearly out of competition (given the time and complexity to decompress it; which is probably why Sonic Team replaced in with Kosinski in the later games).


    What are the advantages exactly? (Brief format documentation)

    Now, another question's probably arose: so, its compression is slightly worse than Kosinski's, but it may benefit in other aspects, right?

    You're not mistaken. This format is extremely simple and efficient to decompress (when compared with Kosinski). It doesn't use variable-length codes to describe compression flags and all of its data elements have a size of exactly 1 byte.

    The decompression code for 68k is also elegantly simple and merely fits just 38 instructions (see the code in the next section).
    For comparison, my own compression format, Comper, with the emphasis on the extreme simplicity takes just 19 instructions in comparison, and the original Kosinski consists of 66 instructions. While comparing number of instructions (on condition that the loops are not unrolled) is nowhere near an accurate measure, this may at least give you some idea of these formats' decompression complexity in comparison with each other.


    Format description

    The compressed data starts with a header, followed by the compressed stream:
    Code:
    <data> = <header> <compressed stream>
    The 2-byte header merely encodes uncompressed data size, in bytes. Size is a 16-bit Big-endian integer, meaning up to 64 kb worth of data may be compressed.
    Code:
    <header> = <uncompressed size>
    The compressed stream consists of description fields, followed by several data blocks.

    Description fields are 8-bit bitfields that encode whether the following data blocks are uncompressed bytes or commands:
    • If bit == 0, the next data block to read is considered an uncompressed byte. One byte is read from the compressed stream. This byte is appended to the uncompressed stream directly;
    • If bit == 1, the next data block to read is considered a command. One or two bytes are read from the compressed stream, depending on the first bits of the command itself. The command usually initiate copying of the previously uncompressed data, if this data is going to be repeated.
    The description field bits are processed right to left (least significant bits first). When all bits are exhausted (and after the last corresponding data block is fetched), the next byte in the compressed stream is read and considered a new description field.

    Thus, the compressed stream may look something like this:
    Code:
    <compressed stream> =
    <description field 1> <data block 1> <data block 2> ... <data block 8>
    <description field 2> <data block 9> <data block 10> ...
    If data block represents a command, the following formats are possible, depending on the 2 most significant bits in the first byte read (formats below er represented as bitfields for clarity):
    Code:
        %00011111           Stop flag
    
        %0ddnnnnn dddddddd  Uncompressed stream copy (Mode 1)
                            dd..dddddddd = Displacement (0..-1023)
                            nnnnn = Number of bytes to copy (3..33)
       
        %10nndddd           Uncompressed stream copy (Mode 2)
                            nn = Number of bytes to copy (2..5)
                            dddd = Displacement (0..-15)
                   
        %11nnnnnn           Compressed stream copy
                            nnnnnn = Number of bytes to copy (8..71)
    As seen above, if command with a value of $1F (%00011111) is met, the decompression stops immediately. Each compressed stream should include the stop command at the end.


    Alright, how to use it? (Compression tools and the 68k decompressor)

    If you're really want to give it a try and you know well what you're doing, go ahead and grab yourself a 68k decompressor and the compression tool for it. That would be a good start.

    The original decompressor for the M68K
    Code:
    ; ---------------------------------------------------------------
    ; Konami's LZSS variant 1 (LZKN1) decompressor
    ; ---------------------------------------------------------------
    ; INPUT:
    ;       a5      Input buffer (compressed data location)
    ;       a6      Output buffer (decompressed data location)
    ;
    ; USES:
    ;       d0-d2, a0, a5-a6
    ; ---------------------------------------------------------------
    
    KonDec:
            move.w  (a5)+,d0        ; read uncompressed data size   // NOTICE: not used
            bra.s   @InitDecomp
    
    ; ---------------------------------------------------------------
    ; Alternative (unused) decompressor entry point
    ; Skips uncompressed data size at the start of the buffer
    ;
    ; NOTICE: This size is not used during decompression anyways.
    ; ---------------------------------------------------------------
    
    KonDec2:
            addq.l  #2,a5           ; skip data size in compressed stream
    ; ---------------------------------------------------------------
    
    @InitDecomp:
            lea     (@MainLoop).l,a0
            moveq   #0,d7           ; ignore the following DBF
    
    @MainLoop:
            dbf     d7,@RunDecoding ; if bits in decription field remain, branch
            moveq   #7,d7           ; set repeat count to 8
            move.b  (a5)+,d1        ; fetch a new decription field from compressed stream
    
    @RunDecoding:
            lsr.w   #1,d1           ; shift a bit from the description bitfield
            bcs.w   @DecodeFlag     ; if bit=1, treat current byte as decompression flag
            move.b  (a5)+,(a6)+     ; if bit=0, treat current byte as raw data
            jmp     (a0)            ; back to @MainLoop
    ; ---------------------------------------------------------------
    
    @DecodeFlag:
            moveq   #0,d0
            move.b  (a5)+,d0        ; read flag from a compressed stream
            bmi.w   @Mode10or11     ; if bit 7 is set, branch
            cmpi.b  #$1F,d0
            beq.w   @QuitDecomp     ; if flag is $1F, branch
            move.l  d0,d2           ; d2 = %00000000 0ddnnnnn
            lsl.w   #3,d0           ; d0 = %000000dd nnnnn000
            move.b  (a5)+,d0        ; d0 = Displacement (0..1023)
            andi.w  #$1F,d2         ; d2 = %00000000 000nnnnn
            addq.w  #2,d2           ; d2 = Repeat count (2..33)
            jmp     (@UncCopyMode).l
    ; ---------------------------------------------------------------
    
    @Mode10or11:
            btst    #6,d0
            bne.w   @CompCopyMode   ; if bits 7 and 6 are set, branch
            move.l  d0,d2           ; d2 = %00000000 10nndddd
            lsr.w   #4,d2           ; d2 = %00000000 000010nn
            subq.w  #7,d2           ; d2 = Repeat count (1..4)
            andi.w  #$F,d0          ; d0 = Displacement (0..15)
    
    @UncCopyMode:
            neg.w   d0              ; negate displacement
    
    @UncCopyLoop:
            move.b  (a6,d0.w),(a6)+ ; self-copy block of uncompressed stream
            dbf     d2,@UncCopyLoop ; repeat
            jmp     (a0)            ; back to @MainLoop
    ; ---------------------------------------------------------------
    
    @CompCopyMode:
            subi.b  #$B9,d0         ; d0 = Repeat count (7..70)
    
    @CompCopyLoop:
            move.b  (a5)+,(a6)+     ; copy uncompressed byte
            dbf     d0,@CompCopyLoop; repeat
            jmp     (a0)            ; back to @MainLoop
    ; ---------------------------------------------------------------
    
    @QuitDecomp:
            rts
    


    The compression tool

    lzkn in a tool I created to work with this compression format. It's cross-platform, fully open-source and comes with comprehensive documentation.
    Its repository is available on Github: https://github.com/vladikcomper/konami-compression-tools

    As mentioned earlier, my implementation of the compressor is confirmed to have even better compression ratio than the original compressor used by Konami. For that reason, my tool also supports recompression (decompressing already compressed art and compressing it again), just like KENSC.


    Downloads
    For Linux and MacOS users, it's recommended to build the tool from the source code. For build and installation instructions, please see project's documentation: https://github.com/vladikcomper/kon...uilding-from-the-source-code-and-installation


    Basic usage (Windows users)

    If you're an ordinary Windows user, you may be more comfortable with compressing files via drag&drop on the executable.
    1. Just put lzkn.exe in a folder where it's easy to get.
    2. Drag&drop a file (say file.bin) you want to compress on the lzkn.exe.
    3. That's it, the compressed file should appear in the same folder (say, file.bin.lzkn1 in our example)
    WARNING! Due to limitations of the format itself, only files less than 64kb can be compressed successfully!


    Advanced usage (command line arguments)
    Code:
    lzkn [-c|-d|-r] input_path [output_path]
    The first optional argument, if present, selects operation mode:
    • -c = Compress <input_path>;
    • -d = Decompress <input_path>;
    • -r = Recompress <input_path> (decompress and compress again).
    If flag is ommited, compression mode is assumed.

    If [output_path] is not specified, it's set as follows:
    • .lzkn1 extension is appended to the <input_path> in compression mode;
    • .unc extension is appended to the <input_path> in decompression mode;
    • In recompression mode, the same filepath is used.
    Examples:

    The following command compresses file.bin to file.bin.lzkn1:
    Code:
    lzkn file.bin
    In decompression mode, the same file.bin.lzkn1 may be decompressed to file.bin.lzkn1.unc by default:
    Code:
    lzkn -d file.bin.lzkn1
    Recompress old-compressed.bin to itself:
    Code:
    lzkn -r old-compressed.bin
    Compress uncompressed.bin to compressed.bin:
    Code:
    lzkn -c uncompressed.bin compressed.bin
    Please note that -c is the default mode and may be omitted.
     
  2. Clownacy

    Clownacy Well-Known Member Staff

    Joined:
    Aug 15, 2014
    Messages:
    894
    Looking at the code, your compressor seems like it always uses the longest compression runs possible.

    You once mentioned to Flamewing that you thought better compression could be achieved by breaking some runs early. As Flamewing describes in his write-up, it is possible to do this, and create 'perfect' LZSS files which are as small as the format could possibly allow. Would you be interested in adding that to your compressor in the future?
     
    vladikcomper and ProjectFM like this.
  3. vladikcomper

    vladikcomper Well-Known Member Member

    Joined:
    Dec 2, 2009
    Messages:
    389
    Location:
    Russia
    @Clownacy, you've brought a fair point here. The reason I haven't (yet) attempted a more 'brute' approach in figuring out the most effective compression chain, is because I've done some heuristic tests beforehand and came to the conclusion that the the format itself is so carefully designed, rearranging or splitting dictionary references may give little to no improvement to the compression efficiency (we're talking about .5-1% of size reduction at best, which comes at cost of vastly increased memory and/or calculation complexity of the compression algorithm).
    Kosinski, on the other hand, is a little different story though, as it presents a variety of dictionary references formats (and sizes) and possible combinations of them to encode the same run of bytes.

    So let me explain myself in detail to you (and possibly anyone else wondering about the idea behind smarter LZSS compression approaches).
    As you mentioned earlier, I indeed made an assumption back in 2013 that Kosinski compression algorithm could be theoretically improved by analyzing future matches in the lookahead buffer and/or carefully splitting some existing matches to see if a more compact combination of short dictionary references exists to encode them (as opposed to a single long one).
    @Flamewing proved my assumption right, eventually creating a more powerful LZSS compression algorithm, which powers KENSC today.

    For everyone else reading this and having hard time understanding the terminology here, allow me to quickly introduce to LZSS, a data compression technique behind Kosinski, LZKN, Saxman, Comper and much more...


    LZ77/LZSS compression basics

    The idea of this compression is simple as day: find sequences of bytes that were met earlier and replace them with the references to these bytes in the previously read data.

    Given the string:
    Code:
    abcabcdabbcd
    The compressor will identify the repeated fragments (redundancies):
    Code:
    abc[abc]d[ab]b[cd]
    And replace them by the references to the earlier read data:
    Code:
    abc(-3,3)d(-3,2)b(-5,2)
    The references are basically (offset, length) pairs. They're also called dictionary references.
    For instance, (-3,3) from the example above simply tells the decompressor: look 3 bytes back, copy 3 bytes from there. This way, we would copy the "abc" fragment that occurred earlier in the data stream. As long as these references are smaller than the byte sequences they encode, you actually reduce the size of the original data by encoding it more efficiently.

    This is basically how LZ77/LZSS family of algorithms works. Simple and powerful, isn't it?


    Kosinski vs LZKN format

    The only key difference between LZSS-based algorithms is how they encode raw data bytes and the dictionary references.
    As you may easily guess, the dictionary references themselves are represented by, well, bytes. So you must also tell decompressor which bytes in the compressed stream are raw data and which are the dictionary references.
    Kosinski, LZKN, Comper and many other formats implement so-called description fields to provide this information.

    Now, Kosinski compression attempts to be pretty complex and inventive with how it stores raw data and dictionary references:
    Code:
    %<1> [dddddddd] - 9 bits - encodes a raw byte (d = data bits);
    %<00cc> [oooooooo] - 12 bits - encodes an inline dictionary reference, e.g. (offset, length) pair (o = offset, length = cc + 2). May encode fragments that are 2 to 5 bytes long;
    %<01> [oooooooo] [oooooccc] - 18 bits - encodes a short dictionary reference (o = offset, length = ccc + 2). May encode fragments that are 2 to 9 bytes long;
    %<01> [oooooooo] [ooooo000] [cccccccc] - 26 bits - encodes a long dictionary reference (o = offset, length = cccccccc + 1). May encode fragments that are 1 to 256 bytes long.
    NOTICE: Bits in <> brackets reside in the description fields and bits in [] brackets reside in separate bytes in the compressed data. While stored separately, I decided to put them next to each other, because this is the exact order in which they're processed, and it also clearly shows how many bits each construction takes.

    LZKN1 format is simpler designed in a way it handles references and raw data:
    Code:
    %<0> [dddddddd] - 9 bits - encodes a raw byte (d = data bits);
    %<1> [10ccoooo] - 9 bits - short dictionary reference (2 to 5 bytes);
    %<1> [0ooccccc] [cccccccc] - 17 bits - long dictionary reference (3 to 33 bytes);
    %<1> [11cccccc] [...] - 9 bits + 8 * length bits - encodes a sequence of raw bytes (8 to 71 bytes), are more effective way of storing a raw data, if it's long enough.
    As you see, both LZKN and Kosinski provide several ways of encoding a single dictionary reference, each comes at its own cost. While a long reference may effectively represent any reference you're able to find, the compressor may and must chose the shorter formats for increased space efficiency.


    Why "smarter" compression matters

    The way most LZSS compressors work is pretty simple: process source data bytes consecutively and find the earliest and the longest references to the previously seen data. Because the longer the match, the more bytes we save on shortening it to a single dictionary reference, right?
    Well, that isn't always the case, it turned out, because the earliest match found is not always the longest one! But we'll get back to it in a minute...

    While in LZKN, splitting and remapping dictionary references doesn't contribute much to the encoding efficiency (there are no means to encode reference of the same length differently, unlike Kosinski), Kosinski presented us with some interesting combinations (compression tricks), where certain non-standard decisions taken by a compressor result in a better space efficiency.

    Example 1.
    Given any found reference with the length of 10, for instance:
    Code:
    01234567890123456789
    0123456789[0123456789]
    0123456789(-10,10)
    One may think that the only effective way to encode this reference is to use this format (as described above):
    Code:
    %<01> [oooooooo] [ooooo000] [cccccccc] - 26 bits - a long dictionary reference (1 to 256 bytes long)
    This is what the most basic LZSS compressor will do, actually.
    Because there exists no shorter format to represent dictionary reference with the length of 10. So we spend total of 26 bits to encode this reference.

    However, if the we decide to split this reference in two halves:
    Code:
    01234567890123456789
    0123456789[01234][56789]
    0123456789(-10,5)(-10,5)
    We may now use inline reference format for each half:
    Code:
    %<00cc> [oooooooo] - 12 bits - encodes an inline dictionary reference (2 to 5 bytes long)
    Despite having to code 2 references instead of 1, we now magically spend less bits doing so: 12 * 2 = 24 bits. 2 bits saved, which is ~7,7% improvement!
    No one usually programs compressors to take such unusual decisions.

    Please note that this is an edge case: it works only for the length of 10 and if the source bytes are close enough.
    Something like this isn't possible in LZKN, because its format is more carefully designed.

    Example 2.
    Given a string:
    Code:
    bcdababcd
    The standard LZSS compressor will always choose the earliest and longest matches possible:
    Code:
    bcdab[ab][cd]
    bcdab(-2,2)(-6,2)
    In Kosinski, for encoding these two references, we may easily use the most size-effective format we've got:
    Code:
    %<00cc> [oooooooo] - 12 bits - encodes an inline dictionary reference (2 to 5 bytes long)
    So, this part takes 24 bits to encode (while it decodes into 4 whole bytes which is 32-bit), pretty effective compression, right?

    Wrong. Let's see, what happens, if we just skip/sacrifice one byte to let compressor find another reference it couldn't have seen before:
    Code:
    bcdaba[bcd]
    bcdaba(-6,3)
    Now, to encode the same "abcd" part, we need to present "a" as raw byte and the rest of it as a reference:
    Code:
    %<1> [dddddddd] - 9 bits - encodes a raw byte (d = data bits);
    %<00cc> [oooooooo] - 12 bits - encodes an inline dictionary reference (2 to 5 bytes long);
    We now use 9 + 12 = 21 bits. Incredible, isn't it? We suddenly managed to save 3 whole bits, which is 12,5% improvement!

    Now, obviously, a standard LZSS compressor cannot be programmed to take such smart decisions. Which is why improved KENSC compressors use a variation of 'brute' force to figure out which combination of raw bytes and dictionary references produces the best possible outcome.
    Again, this particular example is not the case for LZKN, because raw bytes and short references are encoded in the same size.

    So far I haven't found such cases (at least heuristically), where LZKN could benefit from the 'smarter' but slower compression approach and the graph idea felt too difficult yet unnecessary to implement. Now, I believe these cases do exist, and I may attempt the 'smarter' way at some point, where I have time to. But I expect the compression improvement to be modest at the very least (.5-1% for the price of 10-100x times longer compression, as I said earlier).
     
    Last edited: May 5, 2020
  4. Flamewing

    Flamewing Elite Hacker Member

    Joined:
    Aug 28, 2011
    Messages:
    37
    Location:
    Brasil
    Looking over, since this is a LZSS variant, it should be relatively simple to adapt my perfect compressor to work with it. I added a reminder to myself.
     
    TheChosenHacker likes this.
  5. Flamewing

    Flamewing Elite Hacker Member

    Joined:
    Aug 28, 2011
    Messages:
    37
    Location:
    Brasil
    I added the compressor/decompressor in; seems to have no issues so far. Comparisons using S2 level art:
    • vladik's LZKN1 encoder: 111535 bytes (+6719 bytes, average 0.5s to compress)
    • My encoder: 109412 bytes (+4596 bytes, average 0.95s to compress all files)
    • Original Kos: 104816 bytes (+0 bytes, reference)
    • My Kos encoder: 100446 bytes (-4370 bytes, average 5s to compress all files)
    For gains of 2123 bytes over vladik's encoder, or about 1.9%. Were it not for the fact it was very easy to do it with my generic LZSS backend, it would not be worth the effort at all.