Dominic Szablewski, @phoboslab
— Thursday, August 10th 2023

Rewriting wipEout

The source code for the classic PSX launch title wipEout was leaked in 2022. A few month ago I finally sat down to take a look at it. The result is a (nearly) complete rewrite that compiles to Windows, Linux, macOS and WASM.

wipEout Thanks to WASM and WebGL you can play wipEout right in your browser!

I'm not the only one who embarked on a path to restore the game. To my knowledge, there are two other efforts ongoing: WipEout Phantom Edition and a yet unnamed project by XProger. Both offer more features than my rewrite. If you're on Windows and you just want to enjoy the game, these are the better option.

However, neither the Phantom Edition nor XProger's version come with the source code. Understandably so. The legality of re-distributing the leaked source is questionable at best.

So let's just pretend that the leak was intentional, a rewrite of the source falls under fair use and the whole thing is abandonware anyway:

github.com/phoboslab/wipeout-rewrite

(Aside: I would have loved to work this into an officially sanctioned remaster, but as you can imagine getting a hold of anyone at Sony is impossible. So here we are.)

Remnants of the past

The quality of the leaked source is abysmal. From what I can piece together, it mainly contains the “Wipeout ATI 3D Rage Edition” of the game. A lackluster port for Windows that was bundled with ATI GPUs.

The directories named WipeoutPSX and WIN95/ are not the original PlayStation or a Win95 versions, but just contain some dysfunctional intermediate copies. The Wipeout98/ directory apparently was the one that was compiled into the ATI Rage Edition, but it still included some header files from the PSX and Win95 directories.

Let me paint you a picture: Even the NTSC version for the PlayStation looks like an afterthought quickly hacked into the PAL version. #ifdefs switched some physics calculations from 25 fps to 30. Later, the PC DOS release, Win95 version and finally the ATI Rage Edition were haphazardly piled on top. Each time by a different set of developers that somehow had to make sense of this mess. The code still contains many of the remains of what came before. There was never any time to clean it up.

Each new release was decidedly worse than the previous one. Compare the original PSX version (video) to the DOS release (video): The vertex lighting on the track is gone, everything looks flat, there is no transparency and the speedometer was redrawn by a programmer. The ATI Rage Edition (video) carried this mess forward, introduced visible seams in the geometry everywhere, somehow corrupted the software z-sorting and then screwed up the text rendering with ghastly letter spacing.

Deep down the source contains some sensible parts. I assume the track rendering and some of the physics where implemented first. What followed is some caffeine induced nightmare code written under immense time pressure. The 5000 lines of if else that handles the menu state is a striking witness to this insanity.

After the initial release was out the door, this artifact was handed down to the least experienced developers. They had the job to force it into whatever shape would bring some money.

Now, I have enormous respect for the original developers. The code may not be pretty, but the result justifies it all. It was a launch title for the PSX and it still holds up today. The developers faced the unknowns of never before seen hardware and 3D was a whole new dimension to make sense of. They did a tremendous job.

I'm sure the DOS and ATI Rage versions were equally challenging to get working. In the end it was a management decision to produce something that could be sold, instead of producing something good.

Compiling a pile of garbage

It took me a few days to browse through the code, assemble the right set of sources and stub out all the missing functions. After about 500 iterations of calling make and then fixing some problems it finally gave in. The game launched into some glorious printf()'s.

Just a few hours later I had plugged in SDL2, opened a window and was drawing some honest-to-goodness triangles with OpenGL. I hadn't yet implemented any controls, so I just waited for the game's attract loop and stood in awe.

What you see here are polygons that are entirely transformed on the CPU. I.e. all the perspective calculations are handled in software. OpenGL just received the final screen coordinates of each triangle.

To understand why, we have to look at the PlayStation.

The PSX devkits came with a library called LIBGPU, which handled the rendering. Since the PSX GPU didn't have a z-buffer, all primitives you wanted to draw (triangles, quads and sprites) needed to be submitted into an “ordering table” or OT for short. This table had (typically) 8192 slots allowing for 8192 different z-levels. The GPU would then rasterize the list back to front. The result would not be perfect (in some situations neither of two polygons will have all pixels in front of the other one), but close enough.

The ordering table necessitates that you know the z-depth of each primitive before you hand it to the GPU for drawing. It is my understanding that the PSX original did the perspective calculations on a co-processor.

(As an aside, the data structure used for this ordering table was just a linked list, but I found the specific use of it rather clever. See this document for the details. Also have a look at the excellent PlayStation Architecture article.)

LIBGPU understands a bunch of different primitive types: flat or Gouraud shaded, textured or not, triangle or quad and different types of sprites and lines. In total there's about 20 different primitive types and wipEout uses most of them. In fact, the data format for the .prm files that I painstakingly reverse engineered in 2015 are just a dump of the primitive structs that LIBGPU understands. Would've liked to know that before!

So the PSX wipEout used an ordering table and a set of LIBGPU related structs for primitives. Why am I telling you this? Well, the PC version does the same!

wipEout was very specifically written for the PlayStation. There's no abstraction of any drawing routines anywhere in the code. Primitives get constructed and put into the ordering table all over the place. Consolidating this all would've been a tremendous task. So when the game was ported to the PC, the developers re-implemented LIBGPU in software, basically “emulating” the rasterization part of the PSX GPU.

Of course for the ATI version the rasterization was not done on the CPU anymore, but the ordering table, transformation on the CPU and the need to juggle 20 different primitive types still remained.

Rewriting the renderer

After I made a bit of sense of the OT[ CurrentScreen ] + depth incantations sprinkled throughout the code, I tried to understand how the transformations were set up. Remember, this was the 90s and "scene graphs" were all the rage. So wipEout uses a struct called Skeleton that, in theory, could be set up as tree and handle groups and subgroups of objects, each tied to their parent's transformations.

/* Skeleton Structure ( Hierarchical Coordinate System )
*/
typedef struct Skeleton {
   MATRIX            relative;   /* Relative rotation/translation matrix */
   MATRIX            absolute;   /* Absolute rotation/translation matrix */
   short             update;     /* Set if absolute matrix needs updating ( i.e. parent matrix has been changed ) */
   struct Skeleton*  super;      /* Parent Skeleton */ 
   struct Skeleton*  sub;        /* First Child Skeleton of this Skeleton */
   struct Skeleton*  next;       /* Next Child Skeleton of Parent Skeleton */
} Skeleton;

In practice, this “scene graph” was never more than 2-deep. Hacks were used to shuffle Skeletons around and manipulate the transformation matrices directly. I guess it sounded like a good idea in a book at the time, but if you read between the lines of wipEout's code you begin understand the developer's pain.

Getting rid of this Skeleton was a lot of work. Preparing the code to do the transformations on the GPU equally so. The new renderer has nothing in common with what was there before. It just speaks triangles and they only come in one shape: Gouraud shaded and textured. If you need it flat shaded, you set all 3 vertices to the same color. If you need it un-textured, you use the RENDER_NO_TEXTURE (which internally is 2x2 texture with all white pixels).

The goal here was to provide a fairly minimal interface for a renderer:

void render_set_view(vec3_t pos, vec3_t angles);
void render_set_model_mat(mat4_t *m);
void render_push_tris(tris_t tris, uint16_t texture);
void render_push_sprite(vec3_t pos, vec2i_t size, rgba_t color, uint16_t texture);
void render_push_2d(vec2i_t pos, vec2i_t size, rgba_t color, uint16_t texture);

This interface is currently implemented by render_gl.c, but could be swapped out for render_vulcan.c, render_metal.c or render_software.c without any modifications elsewhere in the code. In theory at least.

The current OpenGL renderer uses a single texture atlas of 2048x2048 pixels. It never needs to bind any other texture and it uses the same vertex and fragment shader for everything.

I'm quite happy with how it turned out, but there's lot's of room for improvement. The biggest flaw is how you need to submit any triangle individually. Most of wipEout's geometry is static, so it should need to be uploaded to the GPU just once.

Even with the current, extremely naive approach, the game renders at about 6000 fps in fullscreen retina on my M2 Macbook.

Recording from my aging Desktop PC, running on Linux

Rewriting the physics

All the physics in wipEout are tied to the assumption of a 30 fps frame step. The game does measure the delta time from the last frame, though and re-runs the physics multiple times per frame if needed. Going above 30 fps with this setup is impossible and left me with two options:

Of course I chose the latter and entered a world of pain.

The PSX hardware did not support floating point numbers, so wipEout used fixed point math for everything. With varying precisions. In different contexts.

In any case, the precision would not be enough for higher frame rates, so I opted to change everything to float, too.

I painstakingly went through the source to find all instances where a units per 30 fps step was assumed, figured out the fixed point precision that was used in this instance and convert it to units per second. I also implemented some basic functions to deal with 3d vectors.

It went from code like this

playerShip->vpivot.vx += playerShip->apivot.vx;      
playerShip->vpivot.vy += playerShip->apivot.vy;
playerShip->vpivot.vz += playerShip->apivot.vz;

playerShip->ppivot.vx += sar(playerShip->vpivot.vx , VELOCITY_SHIFT);
playerShip->ppivot.vy += sar(playerShip->vpivot.vy , VELOCITY_SHIFT);
playerShip->ppivot.vz += sar(playerShip->vpivot.vz , VELOCITY_SHIFT);

to this

self->vel = vec3_add(self->vel, vec3_mulf(self->accel, system_tick()));
self->pos = vec3_add(self->pos, vec3_mulf(self->vel, system_tick()));

I left a bit of a mess in the process, though. I didn't go all the way back to change the ship's attributes (thrust, turn rate etc.) into units per second, but instead changed it in place to something like (0.015625 * 30 * system_tick()) to account the .6 fixed point and the 30 fps step. So this still needs some cleaning up.

I hope I didn't change the physics in the process. It seems like I didn't and the game feels the same, whether it's running with 4000 fps or capped to 20. Still, I'm sure I changed some things that are now behaving a little different from the original game.

It leaves me wondering if this was the right decision. Getting rid of fixed point math certainly was the right call, but a fixed time step has a lot of advantages over the uncertainty that variable time steps bring. I will ponder over this another day.

Rewriting the game loop

There were multiple game loops! The title screen, main menu and the game itself all came with their own infinite loop. Once you enter them, you give up all outside control.

For what I had planned, that wouldn't fly. I needed a platform-calls-you approach, instead of a you-call-platform one. There are some things that need to happen each frame, regardless of if whether we are in the menu or not. Having the platform code do all these things and then call the game's update() function would be much nicer.

Also, a WASM version strictly cannot have an infinite loop. You need to give control back to the browser and wait for it to call your code again. So figuring out where exactly to splice the existing code and how to divide it into a setup() and update() function was quite the slog, but absolutely necessary.

While I was at it, I also implemented an extremely simple “scene manager”, where each scene (intro, title, menu and game) are just an init() and update() function.

struct {
    void (*init)();
    void (*update)();
} game_scenes[] = {
    [GAME_SCENE_INTRO]     = {intro_init, intro_update},
    [GAME_SCENE_TITLE]     = {title_init, title_update},
    [GAME_SCENE_MAIN_MENU] = {main_menu_init, main_menu_update},
    [GAME_SCENE_RACE]      = {race_init, race_update},
};

You then call game_set_scene(GAME_SCENE_MAIN_MENU) from anywhere in the code and the next frame switches to that scene.

A further improvement would be to split each scene into a separate update() and draw() function, but I left that for a later date.

Rewriting the HUD, menus & intro

The HUD for wipEout is fairly minimal and mostly consists of a bunch of text in different sizes and colors. Yet, the original source somehow expanded this to a few thousand lines of code. As mentioned before, the original speedometer from the PSX version is nowhere to be found.

I rewrote all the text rendering functionality and put it in a single spot (ui.h/ui.c). With this in place, most of the HUD code fits into 70 lines, with a re-implementation of the PSX speedometer taking another 120.

The original menu “system” was… I don't even know. I never took a close look. It's 5000 lines of spaghetti for the main menu plus another 4000 or so for the in game menu, credits (without the actual text) and win/lose screens. I threw this all away and re-implemented the menus by looking at them in a PSX emulator.

With a new, fairly minimal abstraction shared by all menus the whole menu code (main menu, ingame menu, credits and win/lose screens) clocks in at 1200 lines now. The new menu systems stores a bunch of “pages”. Each page comes with a title and some entries, where each entry can either be a button or a toggle with multiple options.

As a minimal example, here's how a two page menu might be implemented

menu_t *menu;

void menu_demo_init() {
    menu = mem_bump(sizeof(menu_t));
    menu_reset(menu);

    menu_page_t *page = menu_push(menu, "TITLE GOES HERE", NULL);
    menu_page_add_button(page, 1, "OPTION ONE", button_option);
    menu_page_add_button(page, 2, "OPTION TWO", button_option);
    menu_page_add_button(page, 0, "SUBMENU", button_submenu);
}

void button_option(menu_t *menu, int data) {
    // data is the 2nd argument from menu_page_add_button()
}

void button_submenu(menu_t *menu, int data) {
    menu_page_t *page = menu_push(menu, "TITLE FOR THE SUMENU", NULL);
    menu_page_add_button(page, 0, "GO BACK", button_exit_submenu);
}

void button_exit_submenu(menu_t *menu, int data) {
    menu_pop(menu);
}

I'm quite happy with how it works; adding new pages and settings in the options menu is a breeze now.

Some of the layout options (where the title goes, how the entries are displayed, etc.) are complicating things a bit. I'm planning to clean this up a little, now that I have implemented all the different menus of the game and know exactly what's needed. As a late realization: writing a truly flexible menu system is quite difficult.

As for the intro movie: the code that's playing intro was missing from the source. I'm not even sure if it was present in the PC version at all. The STR format that was used in the PSX NTSC verion is well documented, but I haven't gotten around to implement a decoder yet. jPSXdec was able to load and export the video losslessly from which I re-encoded it back into MPEG1 (yeah, yeah, I know…) and just used pl_mpeg to decode it in the game. Good enough.

Rewriting the sound code and mixer

I didn't look at the original implementation and just wrote the sound code from scratch. The sound effects are loaded from the original wipeout.vb file which contains all the samples in an ADPCM format (Sony VAG to be precise).

Most of the calls to play sound effects were quite straightforward and my mixer code reproduces the stereo separation, engine pitch and the doppler effect nicely. However, as a common theme with the leaked source, one sound effect was completely missing in the PC version: the crowd cheers. So I figured out which object names in the scene geometry correspond to the grand stands (of course it's multiple different ones) and implemented those again.

For the music I opted to take QOA for a test drive. It all works nicely and for a downloadable release the file size is not a big deal. For the WASM version having 80% of the download for music (120mb) seems a bit wasteful though. With the music encoded in 128kbit Vorbis it would clock in at 40mb, but also require a lot more code.

One thing that's still missing compared to the PSX version is the reverb effect when flying through tunnels. That's still on my todo list.

Rewriting the memory management

I'm not sure what the original PSX version did, but the PC version had a lot of malloc() and little fewer free() calls scattered around. Now I can assure you that the game doesn't leak any memory, because it never calls malloc().

Instead, there's a fixed size statically allocated uint8_t hunk[MEM_HUNK_BYTES]; of 4mb that is used from both sides:

A bump allocator takes bytes from the front of the hunk. This is used for everything that persists for many frames. When the game starts, it loads a bunch of assets that are needed everywhere (UI graphics, ship models and textures etc.) into this bump allocater and then remembers the high water mark of it. When you load a race track, it loads all assets needed on top. After finishing a race, the bump allocator is reset to the previous high water mark.

On the other side, a temp allocator takes bytes from the end of the hunk. Temporary allocated objects need to be explicitly released again. This is used when loading a file into memory. The file is read at once and unpacked onto the bump allocated side. When done, the temp memory for the file is released again.

Temporary objects are not allowed to persist over multiple frame. So each frame ends with a check to ensure that the temp allocator is empty.

Somewhat related, the OpenGL renderer does the same with the textures: It bumps up texture memory (more precisely space in the texture atlas) and resets it to the previous level when a race ends.

Rewriting everything everywhere

There were a lot more rough edges in the original source. A lot of code was just dead. Mostly specific to the PSX version: functions dealing with the neGcon controller were sprinkled all over the code; dozens of header files and functions related to the PSX GPU, Windows related stuff and some ATI Rage related asset loading code. Throwing these all away was quite time consuming, because they were referenced everywhere.

One of the easier, yet also time intensive tasks was just shuffling functionality around. In the original some functions clearly ended up in a file where they had no business of being there. So that's fixed. The scene animations are now in scene.c; the particle effects in particle.c, etc.

The particle system is a complete rewrite, too. The original was all backwards: it looped through all weapons, looked for active ones and then switch()ed on the weapon type to figure out which particle to spawn. Now the weapons spawn the particles.

Speaking of which, the weapon system was also rewritten. The original was a bunch of spaghetti, beyond untangling.

For both, the particles and the weapons, I opted for a very simple approach where each new object would be allocated from a fixed sized array. The game keeps track of the number of active objects and just iterates and updates those for each frame. Deactivated objects are swapped in place with the last active one.

int particles_active_len;
particle_t particles[PARTICLES_MAX];

particle_t *particles_new() {
    if (particles_active_len == PARTICLES_MAX) {
        return NULL;
    }

    particle_t *p = &particles[particles_active_len++];
    p->active = true;
    return p;
}

void particles_update() {
    for (int i = 0; i < particles_active_len; i++) {
        particle_t *p = &particles[i];

        particle_update(p);

        if (p->active == false) {
            particles[i] = particles[--particles_active_len];
            i--;
        }
    }
}

This code is not only stupidly simple, but also turned out to be faster than any more involved housekeeping approaches. Not that it matters for this game.

The leaked source has a few hundred global variable scattered all over. Many of these are brought in to different source files (using extern int something;) and there was little logic to where they were defined. I removed the need for many of the globals and rewrote the rest into 3 separate structs:

const game_def_t def; containing all definitions for the game: ship attributes, pilots and their names, race tracks, credits, etc.

game_t g; containing the information about the running game: the current race mode and track, lap times, ranks etc.

save_t save; containing the save data that persists between game launches: music and sfx volume, highscores, unlocked tracks etc.

All of these structs are globally accessible everywhere in the game. Now, if you learned in school that global variables are bad (such as I did), don't be alarmed. For a game, the use of globals is often the preferable way to do things. You know there can only ever be one race track loaded at a time, so it's fine to just have a global track_t.

Some aspects of this are not as clear cut though: the rewrite currently assumes to just have one camera. This is obviously true for now, but if we want to have a splitscreen mode, we'd have to supply the current camera_t instance to a bunch of functions instead.

There are many more parts of the game that I re-implemented, but I just want to end this section with a lines of code count:

> cloc WIPESRC/Wipeout98/
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C++                             55           9301           3631          36253
C/C++ Header                    89           1222            889           4370
DOS Batch                        1              0              0             76
-------------------------------------------------------------------------------
SUM:                           145          10523           4520          40699
-------------------------------------------------------------------------------
> cloc rewrite/src/wipeout/
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C                               21           1301            216           6590
C/C++ Header                    21            267             16           1141
-------------------------------------------------------------------------------
SUM:                            42           1568            232           7731
-------------------------------------------------------------------------------

The rewrite is just one fifth of the size of the original. To be fair, this ignores the Sokol and pl_mpeg libraries used as well as some platform glue code. The whole source tree lands at around around ~25k LOC.

Platform backends and compiling to WASM

The rewrite comes with a simple abstraction over timing, keyboard/controller input and audio output. The game code itself is agnostic to the platform backend it's running on.

Currently it compiles with two different platform backends: SDL2 and Sokol. Both of these support multiple platforms (e.g Windows, macOS, Linux, Android, iOS…). Adding a new platform backend – say, for the Nintendo Switch – is straight forward and (in theory) doesn't necessitate any changes to the game code.

I initially developed with the SDL backend and later added the Sokol libraries. Both are an absolute pleasure to work with.

I was especially impressed with how smooth the compilation for WASM with Sokol went. You compile the whole thing with emcc and it just works. Rendering, input, sound, everything was there. Not a single change in the code needed.

Working with C

It's hard to justify writing C these days. It's not the language you'd start a commercial project in. So this wipEout rewrite was a welcome excuse and by far the largest C project I ever worked on. I had an absolute blast cleaning up this mess!

As evident by the leaked source, a C project can become extremely unwieldy. The language doesn't give you much guidance on how to structure your code.

And as hopefully evident by this rewrite, it is possible to produce a C code base that makes sense, is extensible and fun to work with.

I love C, because it forces you to think about the structure of your code and make conscious decisions about it. Working close to the hardware and thinking about how your code is actually executed is just the icing on the cake. For me, it's a very welcome change from all the scripting languages that I usually work with.

Final thoughts

There's still lot's of bugs to fix (both old and new) and more features to implement. If you want to help, please stop by over at github.com/phoboslab/wipeout-rewrite, clone the source and build the game yourself. You'll need the wipeout-data-v01.zip to run it. I will not provide the executables for any platform; please don't ask.

Sony has demonstrated a lack of interest in the original wipEout in the past, so my money is on their continuing absence.

If anyone at Sony is reading this, please consider that you have (in my opinion) two equally good options: either let it be, or shut this thing down and get a real remaster going.

I'd love to help!

© 2024 Dominic Szablewski – Imprint – powered by Pagenode (8ms) – made with <3