Translate
EnglishFrenchGermanItalianPortugueseRussianSpanish

The XL Engine is and will remain free, donations are purely optional but greatly appreciated.

Recent Comments

As many of you may have guessed, I have been working long hours the last few weeks – delaying my plans for posting an update and temporarily slowing progress. Fortunately the pace has died down, so progress can continue more smoothly again. As you can guess, however, this won’t be the last time there are hiccups in the road. Unfortunately I’m way past the point I originally thought I was going to be done – a good reason to take any estimates I give with a giant grain of salt (to be fair though, this is true for most developers and most large scale projects – especially for hobby projects).

 

Progress

As you might imagine, most of my time lately on getting the game code ready. Most of this has been spent refining my tools, adjusting the resulting code – which then goes back to refining the tools again. The good news is that the game code is between 80 – 90% complete. There is a fair bit of work left but things have been progressing well. I expect to have the games starting to become playable within the next few weeks – “real life” permitting. There is a sizable amount of work to go from playable to the whole package being ready for release, including bug fixes – so this is not a release estimate, merely an indication regarding the state of the release and progress.

 

Process

As I talked about before, there are several stages that allow me to go from the original executable and allow me to start piecing the code together and moving towards a fully decompiled result that runs natively on the XL Engine. I thought it would be interesting, in this post, to give a “behind the scenes” look at the intermediate format, between the disassembled code and decompiled code and give a little insight on that process.

This skips the processing steps that discovers the 32 bit code (for now 16 bit code is ignored), disassembles the code, figures out how the functions are arranged, removes dead code, converts the assembly into an internal format for processing and other required work. For now we are going to focus on the processing of a single function. Function processing occurs in several global passes and each pass has several steps, such as figuring out function inputs and outputs but for now we focus on only a few steps.

Listed here is a simple function, at this point in the process the purpose of the function isn’t known.

The first step is to disassemble the function:

000B094C 55               push	ebp
000B094D 89E5             mov	ebp,esp
000B094F 53               push	ebx
000B0950 51               push	ecx
000B0951 52               push	edx
000B0952 56               push	esi
000B0953 57               push	edi
000B0954 81EC00000000     sub	esp,dword 00000000h
000B095A B8603F0300       mov	eax,00033F60h
000B095F E83CF60200       call	0000DFFA0h
000B0964 A1783F0300       mov	eax,[000033F78h]
000B0969 2B05743F0300     sub	eax,[00033F74h]
000B096F C1E002           shl	eax,byte 02h
000B0972 A3983F0300       mov	[000033F98h],eax
000B0977 8B157C3F0300     mov	edx,[00033F7Ch]
000B097D C1E202           shl	edx,byte 02h
000B0980 A1983F0300       mov	eax,[000033F98h]
000B0985 01C2             add	edx,eax
000B0987 8915903F0300     mov	[00033F90h],edx
000B098D A1783F0300       mov	eax,[000033F78h]
000B0992 C1E002           shl	eax,byte 02h
000B0995 A3943F0300       mov	[000033F94h],eax
000B099A A1983F0300       mov	eax,[000033F98h]
000B099F A39C3F0300       mov	[000033F9Ch],eax
000B09A4 8D65EC           lea	esp,[ebp-14h]
000B09A7 5F               pop	edi
000B09A8 5E               pop	esi
000B09A9 5A               pop	edx
000B09AA 59               pop	ecx
000B09AB 5B               pop	ebx
000B09AC 5D               pop	ebp
000B09AD C3               ret

Next the function is converted to an intermediate format. At this point variables are assigned as inputs (%i#), locals (%l#) or globals (%g#). At this stage the tool is only looking at a single function at a time, later variables will be matched up and merged as appropriate and renamed.

void func_000b094c()
{
	/* variables
	Locals (7)
	EBX, ECX, EDX, ESI, EDI, EAX, [ebp-20]
	Globals (7)
	[33f78], [33f74], [33f98], [33f7c], [33f90], [33f94], [33f9c]
	*/
	%l5 = 0x00033f60;
	func_000dffa0();
	%l5 = %g0;
	%l5 -= %g1;
	%l5 <<= 2;
	%g2 = %l5;
	%l2 = %g3;
	%l2 <<= 2;
	%l5 = %g2;
	%l2 += %l5;
	%g4 = %l2;
	%l5 = %g0;
	%l5 <<= 2;
	%g5 = %l5;
	%l5 = %g2;
	%g6 = %l5;
	return;
}

Then the code is simplified by merging statements into blocks, as discussed in a previous blog post. Block boundaries are defined by certain instructions such as calling a function or writing to global memory. At this point the tool already knows what the function inputs and outputs are and can modify the intermediate format to reflect that.

void func_000b094c()
{
	func_000dffa0( 0x00033f60 );
        %g2 = (%g0 - %g1) << 2;
        %g4 = (%g3<<2) + %g2;
        %g5 = (%g0<<2);
	%g6 = %g2;
}

Additional passes are then required to define all of the global variables and types, to try to match strings to variables, functions and/or files and then to finally assign better names to everything. The end result (when everything works) is compiling code which then must be debugged and cleaned up. Once that is done, parts are modified to use the XL Engine services or features (for example using the XL Engine for rendering, input and sound).

This has been posted on the forums a few weeks ago but I neglected to post it on the blog until now.

There have been concerns regarding XLC and supporting different platforms. To address those concerns and describe some changes that I am implement (or will implement in the future), I decided to make a new post. If you have not already, please read the Introducing XLC – XL Engine Scripting System post. Note that most of the features described here will become available after Beta 1.

Bytecode Compilation

By default, XLC code will be JIT compiled and executed as native code without using a VM. However byte code compilation and execution using a virtual machine will be available for platforms that the JIT compiler does not support. While executing scripts using a VM will be slower, it will still be as fast as I can make it. :) This will allow XLC to run on any platform that the XL Engine supports – though I will obviously like to add JIT support to those platforms to maximize script speed. Initially the JIT compiler will support common CPU/instruction sets, including x86, x64 and ARM. (Note, this does NOT mean that the XL Engine will support ARM based platforms for Beta 1 or even give any timelines for that – it merely means the JIT Compiler will support ARM)

Optional Bytecode Optimization

Whenever iterating on scripts, reloading and JIT compiling XLC scripts is the most efficient way to work. In many cases, just writing the code in a text editor and using it directly should be fast enough. However, in cases where the JIT compiled code isn’t fast enough (or you need to use the VM), it will be possible to pre-compile to a binary format – platform independent bytecode – and then spend much more time during the optimization process. This should allow the script execution speed to get even closer to full native performance. A tool will be available, tentatively called “xlcc” which will generate binary files for each script file “.xlo”. Since the bytecode is platform independent, it will still be JIT compiled on load – and no additional tools are required (except for xlcc of course).

Later Features

Language Extensions

In addition to the existing language extensions – such as every script being executed as a coroutine and cooperative threading incoporated as a core feature – I plan on adding a few more language extensions. Primary, I plan on adding HLSL/GLSL like vector and matrix math capabilities – such as swizzles and basic mathematical operations (code such as: float3 a, b, c; a.xy = b.zz * c.xz;). This should support a similar set of types, including float2/3/4, int2/3/4, uint2/3/4, bool2/3/4, float2x2, float2x3, float2x4, and so on. Why? Multiple reasons – the first is to make standard 2D and 3D math operations simpler, which should be useful for the targeted games.

Kernels

The second reason for the language extensions is to support “kernels” – optimized JIT compiled code designed to operate on multiple values simultaneously (either 4, 8 or 16) that can be used by the software renderer (i.e. software renderer “shaders”). Kernels are still JIT compiled, but instead of generating scalar code – the compiler will generate data parallel vectorized code that will make the best use of the CPU specific vector instruction set such as SSE2, AVX, AVX2, AVX-512 and NEON. I also plan on using XLC as an API independent shader language, to allow even GPU shaders to be written once and available for OpenGL, OpenGL ES and later Vulcan and other APIs. This will replace the current GLSL shaders and make supporting different platforms very easy for the shader writer. Kernels will be accessible to normal scripts in the cases where the scripter needs to process larger data sets, where the per-element operations are homogenous and simple. Note that kernels will also process data in batches of a multiple of 16 elements.

Final Words

In conclusion, I have big plans for XLC – to both accelerate my own work flow (runtime updating when editing UI code for example), to allow modders with a platform independent scripting language with near native performance and to even allow CPU optimized vectorized kernels, which can take advantage of CPU specific features while allowing code to be written in XLC, to provide better then native performance in those cases.

Of course I need to stay focused on Beta 1 so some of these features will not come until later, though probably not as far off as it sounds. :)

In order to facilitate improved modding support and faster iteration times when working on some of the tasks required for the Beta 1 release, I have added scripting support to the engine. Previous builds of DarkXL and DaggerXL already had scripting support but overall I was not happy with the results.

 

Considerations

DarkXL used “Angel Code” for scripting. Angel Code has some nice advantages over some scripting languages, such as Lua, in how nicely it integrates with C/C++ code and provides some nice syntax. However, due to the lack of JIT support, it is very slow when compared to other options such as LuaJIT, Javascript and C#.

I would like to be able to move some of the game code into scripts – such as AI and weapon code – in addition to using it for UI in both the engine and games. So this presents a few issues with the previous solution – I need the overhead of calling script functions and those functions calling back into the engine or game code to be very small, the scripts must execute quickly (ideally no virtual machine), the scripts should have very fast access to data shared with the engine, scripts must be hot-reloadable so no lengthy compilation or optimization steps, the language should have a common syntax that I don’t mind using (since I will be one of the biggest script writers) and it should be small.

Obviously the DarkXL script system, using Angel Code, does not fulfill these goals. LuaJIT is fast compared to other scripting languages but it can be difficult to find errors (syntax errors can stay hidden until code is executed) and garbage collection can be problematic. There are other reasons I would prefer not to use Lua in the this project that I won’t get into. That said, Lua has been successfully used in many projects and has many useful advantages. If you are looking for a scripting language for your own projects, you could do much worse than Lua. Finally C# and Javascript (using V8 or similar) are just too big for my taste, though C# in particular is a nice language.

 

XLC

XLC stands for XL Engine “C”, which uses JIT compiled “C99″ as the basis for the scripting system. In order to improve the scripting experience, the environment is sandboxed (only engine provided functions and services are available) and the API is written in a way that avoids or hides the use of pointers and avoids user memory management. The language is not hobbled, however, and advanced users can use the full power of C. The compiler is built into the engine and compiles code, as needed, directly into memory. This means that the engine can call script functions with very little overhead, memory and data can be shared between scripts and the engine and scripts can call into the engine with the same cost as calling any C or C++ function using a pointer.

The only tool required to write XLC scripts is a text editor. If the engine is running and you edit a script, the engine will automatically hot-reload and recompile the script – which takes a fraction of a second – and you can see the changes immediately. The compilation is fast enough that scripts can easily be included as part of the data – so levels and game areas can have their own scripts in mods. Scripts can also include other scripts in order to import their functionality, variables and structures – allowing scripts to be broken up into files and allowing people to provide encapsulated functionality that anyone can use in their scripts for any game (unless game specific functions are used).

Obviously extremely fast compilation times come at a cost, the generated code isn’t as fast as Visual Studio C++ or GCC optimized code. However performance is much better than debug code, interpreted languages and even a decent improvement over other JIT compiled scripting languages in most cases. In addition the overhead of passing the “script barrier” is much better then the alternatives – you would need to call many tens of thousands of script functions per frame before it starts to become a problem.

And honestly having the engine reload scripts that change automatically while running is pretty awesome for development. It’ll make finishing the UI work so much easier. Something doesn’t look right? Make a small tweak in the text editor, save and see the change instantly. :D  So much better then shutting down the program, making the tweak, recompiling, launching, getting back to the same place again just to find out your tweak wasn’t quite right.

To be clear, all of these features have already been implemented and are currently working. I have already started to move the XL Engine UI over to scripts to gain the iteration time benefits I mentioned above. And being “C” at heart – any game code I want to move to scripts required very little modification to work correctly.

Below is a small test script that I have used to test various features. Lines starting with // are comments, they describe various features being shown. /**/ type comments are also valid. Fixed size types are also included as well as standard C types. Sized types are defined as u/s/f (unsigned/signed/float) + sizeInBits and include: s8,u8,s16,u16,s32,u32,s64,u64,f32,f64. bool is also defined as a type, meaning true or false.

//include any script files that you wish to pull functionality from. All script functions, 
//constants/enums/defines, script global variables and script defined structures will be 
//accessible.
#include "test2.xlc"

//structures defined in the script, this is shorthand for C structure typedefs which 
//can also be used: typedef struct name { ... } name;
Struct(Test2)
{
    int y;
};

Struct(TestStruct)
{
    int x;
    int y;
};

Struct(Vec3)
{
    float x;
    float y;
    float z;
};

//script global variables using both built-in types, structures defined within this
//script and arrays.
Vec3 data0[1024];
Vec3 data1[1024];
Vec3 data2[1024];
int runCount;

//internal script functions - other scripts that include this one can use them but they 
//will not be used by the engine.
f32 blend(f32 t, f32 x, f32 y)
{
    return x + (y-x)*t;
}

void testFunc(string printMe)
{
    xlDebugMessage(printMe);
}

int fib(int n)
{
    if (n <= 2)
    {
        return 1;
    }
    else
    {
        return fib(n-1) + fib(n-2);
    }
}

//"public" functions can be called by the engine. Future tools will be able to list
//all of these, for example you could be editing a level, load a level script and 
//then select these public functions from a list to run them based on various events.

//This function was used to help test a certain kind of performance. If you can't 
//figure out the point, don't worry its only meant to test floating point math 
//performance and function call overhead.
public void perfTest(void)
{
    f32 blendfactor = 0.7594f;
    for (s32 k=0; k<1000; k++)
    {
        f32 value = 2.0f + (float)(k-50)*0.01f;
        //step 1. fill the data with values.
        for (s32 i=0; i<1024; i++)
        {
            data0[i].x = value; value *= 1.25987f;
            data0[i].y = value; value *= 2.25987f;
            data0[i].z = value; value /= 2.25987f;

            data1[i].x = value; value *= 2.25987f;
            data1[i].y = value; value *= 7.25987f;
            data1[i].z = value; value /= 20.25987f;
        }

        //step 2. blend between the values.
        for (s32 i=0; i<1024; i++)
        {
            data2[i].x = blend(blendfactor, data0[i].x, data1[i].x);
            data2[i].y = blend(blendfactor, data0[i].y, data1[i].y);
            data2[i].z = blend(blendfactor, data0[i].z, data1[i].z);
        }
        blendfactor *= 1.001f;
    }
}

//Testing a public script function with a different number of arguments.
public void simpleInc(int a, int b, int c, int d)
{
    runCount++;
}

//xl...() functions are provided by the engine and are available to all scripts.
//sqr() and someVar are both defined in "test2.xlc" and are available since that
//script is included.
public void simple_main(int arg0, int arg1, int arg2)
{
    int r = fib(32);
    xlDebugMessage("fib(32) = %d.", r);
    runCount++;

    //MAX_MAPPING_COUNT is a engine provided define.
    xlDebugMessage("MAX_MAPPING_COUNT = %d", MAX_MAPPING_COUNT);
    xlDebugMessage("Clock = %d", xlGetClock());

    //Using a script defined structure.
    Test2 test;
    test.y = 3;

    //Another script defined structure.
    TestStruct test2;
    test2.x = sqr(test.y) + someVar - 2;
    xlDebugMessage("test2.x = %d.", test2.x);

    //testing string passing.
    testFunc("this is a string.");

    xlDebugMessage("Test inputs: %d, %d, %d", arg0, arg1, arg2);
}

As an addendum to the previous post, I uploaded a youtube video of Shadow Warrior played with the XL Engine. It is best viewed in 1080p60 on youtube.

Progress Updates

It has been about 2 weeks since my last update but I have not been idle during this time. I decided to scale back on the rate of updates for a variety of reasons but work is ongoing and I check the blog/forums regularly so feel free to comment or ask questions. In the future I plan on posting updates about once a week or so, though the time between updates may occasionally be longer or shorter.

Also remember that you can see the full sized images by clicking on the pictures, they are scaled down to fit the blog format.

 

Decompiling

I have made good strides in the decompilation process, due to a variety of enhancements and fixes to my tools. Previously functions with multiple return statements or complicated flow control with dead code were problematic, resulting in sometimes non-nonsensical code, missing code or incomplete functions. The tool now properly follows the flow control and prunes unused code. Of course this results in functions were code is ordered based on when a branch was hit, so the resulting code must be reodered at the end of the process. Jumps with dynamic offsets are still problematic though, there is still more work to be done.

Next I fixed the “root function” determination code so it can successfully find “main” automatically. Previously I had to do it by hand which can result in errors. The unused function removal is now more robust.

Initialized static memory is now automatically mapped to the memory addresses used by the disassembled code, meaning that pre-initialized static data is now directly available. This is great for many reasons, including using the text referenced by the code to help identify the names and functions in many cases. This allows me to move a lot of the code to the correct files, so the structure of the source tree – in some cases anyway – can mimic what the original source might have looked like.

 

Unified Software Renderer

Recently I resurrected the software renderer I was writing for DaggerXL and start re-integrating it into the XL Engine. For the Beta 2 release, it will serve as a basis for the “unified 3D renderer” that can be used not just for Daggerfall but potentially other XnEngine games and for model-based elements of 2.5D games (such as Dark Forces). The “unified 3D renderer” will support both hardware and software rendering and – with the original code available – be able to functionally match the original visuals but with better performance for high resolutions and the ability to more easily add new features and fix bugs. Of course the original Daggerfall rendering will be available, at least until the new renderer can match it exactly (minus obvious bugs). Beta 1 probably won’t ship with this renderer but it will be available for Beta 2.

Below you can see some screenshots of Daggerfall using the cylindrically mapped sky and tweaked settings. Note that all of the 2D rendering has been disabled (weapons, UI, etc. – obviously not counting sprites in the 3D world). Also note that the rendering is not 100% correct but that will be fixed – including the wrong ground tiles being used, not a problem for the original renderer of course.

Build, Blood and the Newcomer

When working with the Blood code, one of the things I wanted to do was match up the decompiled source with the Build source in order to reduce the amount of work I had to do. I realized, however, that for best results I should test the Build source integration with something working and complete. So to that end, I added the first – and currently only – game that uses the original source code: Shadow Warrior. The purpose of this integration was as follows:
* Get something working with Build in the XL Engine.

* Start refactoring the code and getting it ready for Blood.

* Test the XL Engine functionality, including game life cycle, XL Engine services, sound, input and other systems.

* Have a complete experience to test with the engine and UI.

* Test the performance and memory usage with a complete game running.

 

When, in previous updates, I mentioned matching up Blood code with the Build code – I already had this working in the engine. In fact Shadow Warrior is 100% playable, including sound, music, controls, memory management and so on in the engine and has been for a month or two.

So why play Shadow Warrior using the XL Engine instead of an existing port? Honestly there aren’t really any compelling reasons, the existing ports do a great job. Of course I plan on changing this with future releases and in the future there will be more reasons to play Shadow Warrior on the XL Engine. Regardless it has helped me tremendously with the engine and Blood – so its a worthwhile addition even if no one actually plays it. :D  That said, when it comes time to build the unified sector engine it will be very useful – like Blood it pushes the Build and adds Room over Room, drive able vehicles and other features.

This is a surprise I’m sure but, if things go well anyway, Shadow Warrior will not be the biggest surprise in store for the release.

As you can see, the UI itself has gone through some iterations since I last showed it. Of course any effects are optional and only work if you have a GPU capable of OpenGL 2.0 or newer (see the fullscreen view to get a better idea of what I mean). In addition sound effects have been added to the UI to improve the experience.

And some screenshots of Shadow Warrior in the XL Engine. As said above the port is complete and playable now. But I’m still going to hold off on releasing anything until the other games are ready.

 

 

Final Words

The XL Engine is progressing nicely and all of the features needed by the games are already implemented (as shown with Shadow Warrior). In the next update I hope to show screenshots of some of the other games in action. Finally here is the XL Engine Readme file that will packaged with the release (XLEngine.txt). In it is the copyright notice, description and credits. Take a look and let me know if I am forgetting anything, I want the release to go smoothly when it finally gets here. :)

If you visit the forums you will see that they have been updated. The version of phpBB 3 was seriously out of date so I did a manual update to the latest version (it was too old for the automatic update) so the forum is now running on version 3.1.7-pl1.

As I mentioned in the forum update thread in the Website/Forums section, I decided a new theme was in order to better represent the XL Engine. Given the different types of games the engine will support, which include both sci-fi and fantasy, the previous theme just didn’t fit anymore. In addition the narrow footprint made larger images and lots of text difficult to deal with. Let me know what you think of the new look, of course tweaking or even whole different styles are possible.

On to the news. I recently managed to get Ubuntu installed on my desktop, a surprisingly painful process due to issues with the motherboard and UEFI. So I spent a good chunk of the day Saturday getting the XL Engine to compile and resolving dependencies. While I still have to work on the input and proper fullscreen rendering with multiple monitors – the engine does run, render properly and sound and music sound good was well.

So this means that Linux support has moved from Beta 4 to the initial build, Beta 1. To make this happen I will probably enlist some help a few weeks before the build with packaging for various Linux distributions.

This is a quick update on the engine development.

The sound system implementation is complete for Beta 1, barring any game specific needs that have not yet arisen.

For music the XL Engine supports midi synthesis using Gravis Ultrasound patches, midi synthesis using Sound Fonts (sf2) and direct music playback using Ogg Vorbis. Beta 1 will support “music mods” – where midi songs are replaced by Ogg files.

Peformance of Wild Midi, Fluidsynth and Ogg Vorbis is very good if you have a multicore system (which almost everyone has today).

The engine also supports framerate limiting, vsync with low latency input and can limit cpu useage (assuming your CPU is fast enough to start with). Even running a bunch of threads (game code in one thread, input and OpenGL in another, streaming sound in yet another, etc.) – the CPU usage on my system is around 24% with a heavy game where the game framerate is 120 fps and graphics refresh rate is vsynced at 60 fps (even using the frame limiter to set it to 120 has little effect).

Obviously reduced CPU mode can cause a millisecond or two of latency from time to time but I run with it all the time and don’t notice the difference. :)

The engine work is essentially finished for Beta 1, though I have to finish some UI panels still. Now that sound is in and working and the engine support is fleshed out, I’m going to be focused on finishing the game “porting” until all of the games are fully working. Once that is complete I will finish the UI panels and release the Beta 1 build.

After some deliberation and suggestions, I am tackling midi playback a little differently this time around. The system midi players (like the Windows midi player) have issues and seem to be harder to configure in recent versions of Windows. While there is software available to help, such as VirtualMidiSynth, I don’t want to rely on users installing this software on their computer.

Instead the XL Engine will use software midi synth and support Gravis Ultrasound patches and Sound Fonts. In order to accomplish this I have integrated Wild Midi and am in the process of integrating FluidSynth. The release will come with default patches and fonts so people can hear good quality midi from the get go. Of course you will be able to download and use your own patches and sound fonts.

Setup is easy – just pick the format you want to use and select either the config file (for patches) or sf2 file and you are done. The libraries are statically linked with the engine, so no extra DLLs, downloads or installs to use.

In order to limit the dependencies and code size, audio drivers are being excluded from my build of fluidsynth – it will write raw audio which will be processed by the engine’s sound system, just like Wild Midi. Midi playback is also properly threaded to limit the burden on the main CPU.

Finally I would like to thank Brother Brick (a member of the forums) for suggesting Wild Midi – it is a great library for software midi playback. Unfortunately it does not – yet – have all of the features I want but it does a great job at what it has implemented. :D

Introduction

Recently I have talked about the project and everyone has been able to see progress in the new version of the XL Engine with GitHub. I have talked about work already done and even given some indications as to the amount of work remaining (i.e. all the stuff I should have done before the hiatus – but that’s another story). However I have mentioned working on “all games at the same time” and given a few details of the process – but I have left out many of the concrete details. To make things worse, progress on the game front is not visible since the game code is not in GitHub yet and honestly wouldn’t make much sense even if it was. As a result a lot of work gets done that no one sees yet. While I have my reasons for this decision, this and future posts will help fill in some of the missing details about the development process.

For this post I am going to talk about a specific part of the process – the process of going from disassembled code to C code that will be used by the XL Engine to replace the original executables. I make the following assumptions, which are true but will not go into detail until a future post: the games are fully disassembled and this code is stored in giant text files, the entry points for the 32 bit code are known for each and we limit ourselves to 32 bit code (16 bit code can be ignored for the most part). The engine functionality – such as file access, input, sound, graphics and so forth is assumed.

Originally the process was mostly manual. While this was valuable, allowed me to directly see the execution path, made it easy to see the current contents of memory and was very useful to see various patterns and understand the code. However there was a major problem – it just takes too long. While things moved faster over time since there are a finite set of patterns, even at the limit its still too slow. There are several options – reverse engineering the formats and building the functionality from observation and testing (like the original DarkXL/DaggerXL did and Interkarma is doing now) and/or decoding narrow areas of the code for tricky areas or building a tool set in order to help in the process of decompiling the entire executables and then replacing parts with Engine services (and later unified engine renderers). Since I decided to make these ports “source port” accurate, that really only leaves one good choice.

Why not use existing tools? Actually, for some things, I do. However automatic decompiling is a very difficult task and few tools are useful – I have tried many many of them. There are a few that are somewhat useful, but these tend to be very expensive. Fortunately, for my purposes, the scope is limited (DOS, specific types of games, 32 bit) – so I am better served by my own tools.

 

Process

Currently the process consists of manual and tool assisted elements. I have talked about some of the manual processes in assumptions – finding the entry points and separating 32 bit and 16 bit code. I will talk about the rest of the manual processes later, now I’m going to talk about the tool assisted part. All tools discussed in this post are my own.

 

Overview

Global Passes

* Function Discovery Pass
* Function Grouping
* Function Name Generation
* Function Input Discovery
* Function Output Discovery

 

Local Function Passes

* Build Code Blocks: splits the function into code blocks. Roughly a code block represents a concept expressed as one or more instructions.
* Merge Code Blocks: apply merge passes until no more blocks can be merged. The idea is merge dependent blocks into larger blocks when possible.
* Process Code Blocks: update the local variable list, update the global variable list, use “hints” to help determine variable sizes and types, generate block code.
* Generate function code: function header, local variables, body code.

 

Global Code Generation

* Determine Global Variables used by each function group.
* Generate a header file for each group that define the following:
…. Global variables assigned to this group that are used by other groups.
…. Function headers for functions in this group used by other groups.
* Generate a source file for each group:
…. Write a reference to the group header file.
…. Write includes for external globals and functions from other groups.
…. Write global variables used only by this group (as “static”), include default values.
…. Write forward declarations for internal functions.
…. Write the code for each function.

 

Global Passes

The first task is to process the disassembled code, knowing the information discussed above. In addition there are various tables that are filled in manually over time – function names, variable names and similar data. The processing occurs over many automatic passes.

*The first pass is designed to discover functions, where the entry point address is, how long the function is – and must handle multiple return values. Starting from the entry point function, the tool traverses all possible execution paths in order to eliminate dead code. This will miss functions that are only referenced as function pointers, so this initial set of functions is written to a file that will be modified after additional processing (to add the missing functions).

*In the next pass, the tool groups functions based on locality – with the assumption that functions near each other in the executable are likely near each other in the original source (i.e. in the same object files and thus same source files). These groups may be refined later in the case that functions have debug code that indicate the source file name (something Daggerfall does in many functions). These groups will eventually become files, more on this later.

*Then a human readable name for the function is generated. If the function name is in the name table, meaning I have already assigned a name or know what it is, then that name is assigned. Otherwise a unique name is generated from the group and function index.

*Next the tool determines the function inputs (function arguments). It can determine if a register is used but not set or determine variables passed on the stack. Initially we don’t know the type of variable each input is, so we set it to “unknown” – which defaults to a 32 bit int (s32). Now when functions are called, we will be able to determine which arguments are required and which registers are used.

*Finally the function output / return value is determined. Usually return values, in these games anyway, are returned by register – so the tool detects registers that are set before return and used by the calling code. Returning values on the stack is also possible. Either way the output, if any, for each function is determined and recorded. Again the type is not yet known and defaults to 32 bit int.

Now we are ready to start the local function processing, which is its own series of passes per function.

 

Local Function Passes

Simply put, this process consists of two components – finding patterns and then mapping those patterns to code. The tool is responsible for finding the patterns and writing them out. I then take those patterns, figure out how to map them to C code and then the tool takes that data and does the grunt work of applying the mapping. This is an iterative process where each iteration improves both the results and the tool. There are a finite number of patterns, so work on any specific game helps them all overall.

 

Building Code Blocks

Blocks are generated by processing the “block end points”, where a block end point is one of the following:

* Value modification or assignment to a global or local variable.
* Function call
* Jump (jmp, ja, jz, jnz, etc).
* Return
* System call
* Label – if the preceeding instruction is NOT a block end point.

It should be obvious the code blocks cannot span jumps which account for, essentially, 2/3 of the list.

Once a block end point is reached – a new code block is generated but we need to figure out the dependencies. For functions, for example, this will be the function inputs. Since blocks are processed in order, we don’t have to worry about local or global variables. So for most blocks, dependencies are register assignments. To resolve a dependency, we must walk the code backwards until the register is assigned – all lines that contribute to resolving that dependency becomes a new dependent code block. That code block will oftentimes have dependencies of its own, so it too will have a dependent code block generated. This continues until all dependencies are resolved. Other blocks, unless they affect the dependencies, are skipped over during this process. If a dependency is modified by another block, then that block becomes a dependency and processing can be halted.

Blocks track the number of different blocks that depend on them – this will determine, later, if blocks are inlined or assigned to local variables. It also determines whether blocks can be merged later.

If a register is modified between two block end points and it is NOT a dependency, then it will be assigned as a local variable. In this case all local blocks are discarded and the Block processing is restarted for this function with the new information. In this way local variables that are assigned to registers – like loop counters – are handled properly.

Function outputs are also assigned to local variables so that dependencies in the future will depend on that variable rather than the function itself (we never call a function unless the original code explicitly did so).

Whenever a local or global variable is found and the instruction contains clues about it’s size – i.e. movsx; mov ax, var; etc.  – modify the variable type attribute.

 

Merging Code Blocks

Code blocks can be merged if one is dependent ONLY on one other block. Neighboring code blocks are merged with the new block holding the dependency data for the parent block. Each pass tests the neighbors and merge passes continue until no additional merging occurred during the previous pass (this will usually result in 2 passes).

 

Processing Code Blocks

Once the blocks have been merged the functionality that they express must be discovered. First any local variables referenced are added to the local variable list for the function (if they do not already exist). Second any global variables are added to the global variable list. Any size and type hints are processed for the variables.

Finally the block extracts inputs – local variables, global variables, memory addresses and literals and replaces them with generic input markers (i0 through iN-1, where N = the number of inputs). Then a block key is generated from the resulting text. Finally the key is used to look up the block in a database of code blocks, adding it if it doesn’t exist. This database will be written to disk once processing is complete. If the entry exists it will contain the resulting C code with replacements for inputs. Memory addresses will be replaced by global variables. If the block does not have an entry then the C code is written out as “//Unknown entry #” where # is the block entry index but added to the database so that missing blocks can be added as they are hit.

For example, look at the following code block:
lea eax, [edx * 4]
lea ebx, [edx + eax]
shl ebx, 3

edx is a local variable in this case (call it local0 for example) and ebx is the output, internal registers are labled from t0 to tN-1 in order found, so this can be re-written as:
lea t0, [i0 * 4]
lea o0, [i0 + t0]
shl o0, 3

which has the following entry for C code:
i0 * 40

If only one block depends on this block, then whenever the block is required local0 * 40 will be substituted. If, only the other hand, more than one block depends on it then it will be assigned to a local variable which will be referenced instead:
s32 local1 = local0 * 40;

If the type of local0 is known then local1 will be given the same type – s32 is the default type.

 

Final Words

I have not covered every step in detail but this should give you a much better idea of how the process is evolving. Clearly this isn’t the end of the story – a lot of the work is glossed over or not yet mentioned. But this post is long enough. I will continue to talk about the process in future posts.

 

Testing

If you want to help identify bugs that the XL Engine will need to fix, post them in the DOS Bugs section of the forums, each game has its own sub-forum. Don’t forget to read the sticky so you know what to post.

I have been asked for a time frame for the Beta 1 release and to gauge how far along the build is, as a percentage. I have decided to answer the questions, as best I can, while providing a progress update.

As for when I will say relatively soon, though I’m afraid that I won’t be able to give any dates until I know for sure.

As for % of completeness, that too is hard to say. The engine is basically where it needs to be, with the exception of the sound system – though that already reads Voc files, plays 2D sounds and could easily play 3D and looping sounds – its mainly a matter of resource tracking at this point. The UI needs to be finished but that, honestly, will only take a few nights of work. There are additional niceties I would like to add but most of them can wait (like controller support). So engine/UI wise I would say its at 90% or so but the other 10% isn’t very time consuming.

Where the majority of the remaining time will go is finishing the game support itself. The good news is that the current approach basically allows me to work towards all of the games in parallel (exactly how is a discussion for another day). So overall I would say I’m around 50% of the way there but the pace is accelerating due to improving methods.

This probably doesn’t sound great until I mention that I basically started from scratch a few months ago, largely due to the long hiatus and change in design – though I move and refactor old code as needed. So it is going well and the pace is very promising for meeting my goals of getting a build out near the beginning of this year. And to be clear that 50% is overall, meaning that all the games are made good progress towards completion (Daggerfall, Dark Forces and Blood), though it obviously, and intentionally, doesn’t indicate which are further along.

However as soon as more concrete promises are made something will happen so… I’m going to avoid making said promises until I know for sure. :)-

Estimated Progress Towards Beta 1

Engine – 90%
Game Support – 50%

 


 

Recent Engine Features

* Vsync now works correctly, though causes noticeable “input lag” at low refresh rates – such as 60. However this “lag” has been decreased and should be small enough for many people.

* Frame rate limiting, as an alternative to vsync, now works correctly and accurately – and has almost no perceivable “input lag” on Windows. This doesn’t completely fix tearing but greatly improves it, especially during explosions and rapid screen movement. This can be set to multiples of the refresh rate – such as 120, 240, etc. to further limit any input issues – and tends to improve the overall experience even at higher multiples. Though, obviously, your system should be able to reliably hit the requested framerate. If the system can’t keep up, the framerate limiter has no affect on performance (unlike traditional vsync). In addition it saves on CPU load which can be useful for weaker systems.

The XL Engine will probably default to a frame limiter of 120 Hz, though obviously you can change it to use vsync instead or have no frame limit (this last option is not recommended though).

* Adaptive Vsync – if available this will allow for synchronization if the system performs well enough but does not if the system under performs.

* Proper dynamic vertex buffer support so that performance is good for all Graphics “levels.”

* Render targets and improved UI aesthetic.

 

State of the Game Support

* All games have working libraries and entry points. They can all be selected, run and shut down from the UI.

* All games are debuggable and run, though only one is currently playable.

* All games are fully disassembled and partially re-written. Obviously some of the code is easier to understand then other parts.

* Dead code has been identified and removed from all games.

 

The XL Engine is and will remain free, donations are purely optional but greatly appreciated.