Archive for January 2016
If you visit the forums you will see that they have been updated. The version of phpBB 3 was seriously out of date so I did a manual update to the latest version (it was too old for the automatic update) so the forum is now running on version 3.1.7-pl1.
As I mentioned in the forum update thread in the Website/Forums section, I decided a new theme was in order to better represent the XL Engine. Given the different types of games the engine will support, which include both sci-fi and fantasy, the previous theme just didn’t fit anymore. In addition the narrow footprint made larger images and lots of text difficult to deal with. Let me know what you think of the new look, of course tweaking or even whole different styles are possible.
On to the news. I recently managed to get Ubuntu installed on my desktop, a surprisingly painful process due to issues with the motherboard and UEFI. So I spent a good chunk of the day Saturday getting the XL Engine to compile and resolving dependencies. While I still have to work on the input and proper fullscreen rendering with multiple monitors – the engine does run, render properly and sound and music sound good was well.
So this means that Linux support has moved from Beta 4 to the initial build, Beta 1. To make this happen I will probably enlist some help a few weeks before the build with packaging for various Linux distributions.
This is a quick update on the engine development.
The sound system implementation is complete for Beta 1, barring any game specific needs that have not yet arisen.
For music the XL Engine supports midi synthesis using Gravis Ultrasound patches, midi synthesis using Sound Fonts (sf2) and direct music playback using Ogg Vorbis. Beta 1 will support “music mods” – where midi songs are replaced by Ogg files.
Peformance of Wild Midi, Fluidsynth and Ogg Vorbis is very good if you have a multicore system (which almost everyone has today).
The engine also supports framerate limiting, vsync with low latency input and can limit cpu useage (assuming your CPU is fast enough to start with). Even running a bunch of threads (game code in one thread, input and OpenGL in another, streaming sound in yet another, etc.) – the CPU usage on my system is around 24% with a heavy game where the game framerate is 120 fps and graphics refresh rate is vsynced at 60 fps (even using the frame limiter to set it to 120 has little effect).
Obviously reduced CPU mode can cause a millisecond or two of latency from time to time but I run with it all the time and don’t notice the difference.
The engine work is essentially finished for Beta 1, though I have to finish some UI panels still. Now that sound is in and working and the engine support is fleshed out, I’m going to be focused on finishing the game “porting” until all of the games are fully working. Once that is complete I will finish the UI panels and release the Beta 1 build.
After some deliberation and suggestions, I am tackling midi playback a little differently this time around. The system midi players (like the Windows midi player) have issues and seem to be harder to configure in recent versions of Windows. While there is software available to help, such as VirtualMidiSynth, I don’t want to rely on users installing this software on their computer.
Instead the XL Engine will use software midi synth and support Gravis Ultrasound patches and Sound Fonts. In order to accomplish this I have integrated Wild Midi and am in the process of integrating FluidSynth. The release will come with default patches and fonts so people can hear good quality midi from the get go. Of course you will be able to download and use your own patches and sound fonts.
Setup is easy – just pick the format you want to use and select either the config file (for patches) or sf2 file and you are done. The libraries are statically linked with the engine, so no extra DLLs, downloads or installs to use.
In order to limit the dependencies and code size, audio drivers are being excluded from my build of fluidsynth – it will write raw audio which will be processed by the engine’s sound system, just like Wild Midi. Midi playback is also properly threaded to limit the burden on the main CPU.
Finally I would like to thank Brother Brick (a member of the forums) for suggesting Wild Midi – it is a great library for software midi playback. Unfortunately it does not – yet – have all of the features I want but it does a great job at what it has implemented.
Recently I have talked about the project and everyone has been able to see progress in the new version of the XL Engine with GitHub. I have talked about work already done and even given some indications as to the amount of work remaining (i.e. all the stuff I should have done before the hiatus – but that’s another story). However I have mentioned working on “all games at the same time” and given a few details of the process – but I have left out many of the concrete details. To make things worse, progress on the game front is not visible since the game code is not in GitHub yet and honestly wouldn’t make much sense even if it was. As a result a lot of work gets done that no one sees yet. While I have my reasons for this decision, this and future posts will help fill in some of the missing details about the development process.
For this post I am going to talk about a specific part of the process – the process of going from disassembled code to C code that will be used by the XL Engine to replace the original executables. I make the following assumptions, which are true but will not go into detail until a future post: the games are fully disassembled and this code is stored in giant text files, the entry points for the 32 bit code are known for each and we limit ourselves to 32 bit code (16 bit code can be ignored for the most part). The engine functionality – such as file access, input, sound, graphics and so forth is assumed.
Originally the process was mostly manual. While this was valuable, allowed me to directly see the execution path, made it easy to see the current contents of memory and was very useful to see various patterns and understand the code. However there was a major problem – it just takes too long. While things moved faster over time since there are a finite set of patterns, even at the limit its still too slow. There are several options – reverse engineering the formats and building the functionality from observation and testing (like the original DarkXL/DaggerXL did and Interkarma is doing now) and/or decoding narrow areas of the code for tricky areas or building a tool set in order to help in the process of decompiling the entire executables and then replacing parts with Engine services (and later unified engine renderers). Since I decided to make these ports “source port” accurate, that really only leaves one good choice.
Why not use existing tools? Actually, for some things, I do. However automatic decompiling is a very difficult task and few tools are useful – I have tried many many of them. There are a few that are somewhat useful, but these tend to be very expensive. Fortunately, for my purposes, the scope is limited (DOS, specific types of games, 32 bit) – so I am better served by my own tools.
Currently the process consists of manual and tool assisted elements. I have talked about some of the manual processes in assumptions – finding the entry points and separating 32 bit and 16 bit code. I will talk about the rest of the manual processes later, now I’m going to talk about the tool assisted part. All tools discussed in this post are my own.
* Function Discovery Pass
* Function Grouping
* Function Name Generation
* Function Input Discovery
* Function Output Discovery
Local Function Passes
* Build Code Blocks: splits the function into code blocks. Roughly a code block represents a concept expressed as one or more instructions.
* Merge Code Blocks: apply merge passes until no more blocks can be merged. The idea is merge dependent blocks into larger blocks when possible.
* Process Code Blocks: update the local variable list, update the global variable list, use “hints” to help determine variable sizes and types, generate block code.
* Generate function code: function header, local variables, body code.
Global Code Generation
* Determine Global Variables used by each function group.
* Generate a header file for each group that define the following:
…. Global variables assigned to this group that are used by other groups.
…. Function headers for functions in this group used by other groups.
* Generate a source file for each group:
…. Write a reference to the group header file.
…. Write includes for external globals and functions from other groups.
…. Write global variables used only by this group (as “static”), include default values.
…. Write forward declarations for internal functions.
…. Write the code for each function.
The first task is to process the disassembled code, knowing the information discussed above. In addition there are various tables that are filled in manually over time – function names, variable names and similar data. The processing occurs over many automatic passes.
*The first pass is designed to discover functions, where the entry point address is, how long the function is – and must handle multiple return values. Starting from the entry point function, the tool traverses all possible execution paths in order to eliminate dead code. This will miss functions that are only referenced as function pointers, so this initial set of functions is written to a file that will be modified after additional processing (to add the missing functions).
*In the next pass, the tool groups functions based on locality – with the assumption that functions near each other in the executable are likely near each other in the original source (i.e. in the same object files and thus same source files). These groups may be refined later in the case that functions have debug code that indicate the source file name (something Daggerfall does in many functions). These groups will eventually become files, more on this later.
*Then a human readable name for the function is generated. If the function name is in the name table, meaning I have already assigned a name or know what it is, then that name is assigned. Otherwise a unique name is generated from the group and function index.
*Next the tool determines the function inputs (function arguments). It can determine if a register is used but not set or determine variables passed on the stack. Initially we don’t know the type of variable each input is, so we set it to “unknown” – which defaults to a 32 bit int (s32). Now when functions are called, we will be able to determine which arguments are required and which registers are used.
*Finally the function output / return value is determined. Usually return values, in these games anyway, are returned by register – so the tool detects registers that are set before return and used by the calling code. Returning values on the stack is also possible. Either way the output, if any, for each function is determined and recorded. Again the type is not yet known and defaults to 32 bit int.
Now we are ready to start the local function processing, which is its own series of passes per function.
Local Function Passes
Simply put, this process consists of two components – finding patterns and then mapping those patterns to code. The tool is responsible for finding the patterns and writing them out. I then take those patterns, figure out how to map them to C code and then the tool takes that data and does the grunt work of applying the mapping. This is an iterative process where each iteration improves both the results and the tool. There are a finite number of patterns, so work on any specific game helps them all overall.
Building Code Blocks
Blocks are generated by processing the “block end points”, where a block end point is one of the following:
* Value modification or assignment to a global or local variable.
* Function call
* Jump (jmp, ja, jz, jnz, etc).
* System call
* Label – if the preceeding instruction is NOT a block end point.
It should be obvious the code blocks cannot span jumps which account for, essentially, 2/3 of the list.
Once a block end point is reached – a new code block is generated but we need to figure out the dependencies. For functions, for example, this will be the function inputs. Since blocks are processed in order, we don’t have to worry about local or global variables. So for most blocks, dependencies are register assignments. To resolve a dependency, we must walk the code backwards until the register is assigned – all lines that contribute to resolving that dependency becomes a new dependent code block. That code block will oftentimes have dependencies of its own, so it too will have a dependent code block generated. This continues until all dependencies are resolved. Other blocks, unless they affect the dependencies, are skipped over during this process. If a dependency is modified by another block, then that block becomes a dependency and processing can be halted.
Blocks track the number of different blocks that depend on them – this will determine, later, if blocks are inlined or assigned to local variables. It also determines whether blocks can be merged later.
If a register is modified between two block end points and it is NOT a dependency, then it will be assigned as a local variable. In this case all local blocks are discarded and the Block processing is restarted for this function with the new information. In this way local variables that are assigned to registers – like loop counters – are handled properly.
Function outputs are also assigned to local variables so that dependencies in the future will depend on that variable rather than the function itself (we never call a function unless the original code explicitly did so).
Whenever a local or global variable is found and the instruction contains clues about it’s size – i.e. movsx; mov ax, var; etc. – modify the variable type attribute.
Merging Code Blocks
Code blocks can be merged if one is dependent ONLY on one other block. Neighboring code blocks are merged with the new block holding the dependency data for the parent block. Each pass tests the neighbors and merge passes continue until no additional merging occurred during the previous pass (this will usually result in 2 passes).
Processing Code Blocks
Once the blocks have been merged the functionality that they express must be discovered. First any local variables referenced are added to the local variable list for the function (if they do not already exist). Second any global variables are added to the global variable list. Any size and type hints are processed for the variables.
Finally the block extracts inputs – local variables, global variables, memory addresses and literals and replaces them with generic input markers (i0 through iN-1, where N = the number of inputs). Then a block key is generated from the resulting text. Finally the key is used to look up the block in a database of code blocks, adding it if it doesn’t exist. This database will be written to disk once processing is complete. If the entry exists it will contain the resulting C code with replacements for inputs. Memory addresses will be replaced by global variables. If the block does not have an entry then the C code is written out as “//Unknown entry #” where # is the block entry index but added to the database so that missing blocks can be added as they are hit.
For example, look at the following code block:
lea eax, [edx * 4]
lea ebx, [edx + eax]
shl ebx, 3
edx is a local variable in this case (call it local0 for example) and ebx is the output, internal registers are labled from t0 to tN-1 in order found, so this can be re-written as:
lea t0, [i0 * 4]
lea o0, [i0 + t0]
shl o0, 3
which has the following entry for C code:
i0 * 40
If only one block depends on this block, then whenever the block is required local0 * 40 will be substituted. If, only the other hand, more than one block depends on it then it will be assigned to a local variable which will be referenced instead:
s32 local1 = local0 * 40;
If the type of local0 is known then local1 will be given the same type – s32 is the default type.
I have not covered every step in detail but this should give you a much better idea of how the process is evolving. Clearly this isn’t the end of the story – a lot of the work is glossed over or not yet mentioned. But this post is long enough. I will continue to talk about the process in future posts.
If you want to help identify bugs that the XL Engine will need to fix, post them in the DOS Bugs section of the forums, each game has its own sub-forum. Don’t forget to read the sticky so you know what to post.
I have been asked for a time frame for the Beta 1 release and to gauge how far along the build is, as a percentage. I have decided to answer the questions, as best I can, while providing a progress update.
As for when I will say relatively soon, though I’m afraid that I won’t be able to give any dates until I know for sure.
As for % of completeness, that too is hard to say. The engine is basically where it needs to be, with the exception of the sound system – though that already reads Voc files, plays 2D sounds and could easily play 3D and looping sounds – its mainly a matter of resource tracking at this point. The UI needs to be finished but that, honestly, will only take a few nights of work. There are additional niceties I would like to add but most of them can wait (like controller support). So engine/UI wise I would say its at 90% or so but the other 10% isn’t very time consuming.
Where the majority of the remaining time will go is finishing the game support itself. The good news is that the current approach basically allows me to work towards all of the games in parallel (exactly how is a discussion for another day). So overall I would say I’m around 50% of the way there but the pace is accelerating due to improving methods.
This probably doesn’t sound great until I mention that I basically started from scratch a few months ago, largely due to the long hiatus and change in design – though I move and refactor old code as needed. So it is going well and the pace is very promising for meeting my goals of getting a build out near the beginning of this year. And to be clear that 50% is overall, meaning that all the games are made good progress towards completion (Daggerfall, Dark Forces and Blood), though it obviously, and intentionally, doesn’t indicate which are further along.
However as soon as more concrete promises are made something will happen so… I’m going to avoid making said promises until I know for sure. :)-
Estimated Progress Towards Beta 1
Engine – 90%
Game Support – 50%
Recent Engine Features
* Vsync now works correctly, though causes noticeable “input lag” at low refresh rates – such as 60. However this “lag” has been decreased and should be small enough for many people.
* Frame rate limiting, as an alternative to vsync, now works correctly and accurately – and has almost no perceivable “input lag” on Windows. This doesn’t completely fix tearing but greatly improves it, especially during explosions and rapid screen movement. This can be set to multiples of the refresh rate – such as 120, 240, etc. to further limit any input issues – and tends to improve the overall experience even at higher multiples. Though, obviously, your system should be able to reliably hit the requested framerate. If the system can’t keep up, the framerate limiter has no affect on performance (unlike traditional vsync). In addition it saves on CPU load which can be useful for weaker systems.
The XL Engine will probably default to a frame limiter of 120 Hz, though obviously you can change it to use vsync instead or have no frame limit (this last option is not recommended though).
* Adaptive Vsync – if available this will allow for synchronization if the system performs well enough but does not if the system under performs.
* Proper dynamic vertex buffer support so that performance is good for all Graphics “levels.”
* Render targets and improved UI aesthetic.
State of the Game Support
* All games have working libraries and entry points. They can all be selected, run and shut down from the UI.
* All games are debuggable and run, though only one is currently playable.
* All games are fully disassembled and partially re-written. Obviously some of the code is easier to understand then other parts.
* Dead code has been identified and removed from all games.