Mostly we code...sometimes we write. Every once in a while, we podcast.

Debugging Wine with x86 hardware debug registers.


Last week, I was working on a crash bug in a game running in Proton. About 90% of the time, the game would crash at some random point during the publisher logo screens. Rarely, it would succeed and get to the game’s main menu. That kind of inconsistent behavior points to some kind of invalid memory bug.

While debugging this, I mentioned to my coworker Paul Gofman that it looked like a bogus write to an important part of memory. Paul suggested that I try setting a hardware write watch breakpoint to determine where the bad write was coming from. I hadn’t done that before, so Paul provided me with example code. I found it really useful, so here’s a walkthrough of how I debugged the issue and used the x86 debug registers to diagnose the problem and develop a fix.

Analyzing the crash

Here’s the fatal exception from a WINEDEBUG log of the game crashing:

0124:0128:trace:seh:dispatch_exception code=c0000005 flags=0 addr=00000001700680B4 ip=00000001700680B4 tid=0128

I then ran the game under Wine’s debugger, winedbg, to discover where the crash was occurring (output heavily trimmed to relevant sections):

[aeikum@aeikum ~]$ /tmp/proton_aeikum/winedbg_run
WineDbg starting on pid 00e0
0x0000000170057a59 ntdll+0x57a59: ret

Wine-dbg>c
...
Unhandled exception: page fault on read access to 0xffffffffffffffff in 64-bit code (0x00000001700440f8).

Wine-dbg>info share 0x1700440f8
Module  Address                                 Debug info      Name
PE      0000000170000000-00000001700a1000       Export          ntdll

This shows that it is crashing in Wine code, somewhere in ntdll. I used some printf-debugging to discover exactly where the crash was occurring. I narrowed it down to the function get_full_path_helper in dlls/ntdll/path.c. Specifically it’s crashing when dereferencing a bogus cd pointer in this code:

case RELATIVE_DRIVE_PATH:   /* c:foo   */
    dep = 2;
    if (wcsnicmp( name, cd->Buffer, 2 ))

Here’s the bogus pointer value from one run (the value changes in every run):

0124:0128:err:file:get_full_path_helper cd: 6F72507865546E8F

That’s obviously garbage (it’s actually the ASCII string "\x8FnTexPro"). Where is this garbage coming from? Earlier in the function, you can find where cd is calculated:

if (NtCurrentTeb()->Tib.SubSystemTib)  /* FIXME: hack */
    cd = &((WIN16_SUBSYSTEM_TIB *)NtCurrentTeb()->Tib.SubSystemTib)->curdir.DosPath;
else
    cd = &NtCurrentTeb()->Peb->ProcessParameters->CurrentDirectory.DosPath;

One more debug line to print the value of SubSystemTib shows the problem:

0124:0128:err:file:get_full_path_helper SubSystemTib: 6F72507865546E6F

As you can read from the Wine code above, a non-NULL SubSystemTib indicates that Wine is dealing with a 16-bit process, so it uses the 16-bit thread information block (TIB) structure instead of the modern Windows TIB. This is obviously not a 16-bit game, so the problem is that SubSystemTib is somehow being set to a non-NULL value. Also note that Wine’s code has a comment indicating that this usage of SubSystemTib is a hack, which means that Wine is probably treating this field differently from how Windows does.

The question is, who is setting SubSystemTib to this weird value? Wine only has one line of code that sets SubSystemTib, and it only occurs in 16-bit code, which is definitely not relevant. So it’s either application code writing to SubSystemTib intentionally, or some kind of bogus write like a bad pointer or a buffer overflow in either Wine or the application. We need to find out what code is doing the write to SubSystemTeb to understand how to fix the problem. To do that, I turned to the x86 hardware debug registers feature that Paul suggested.

x86 debug registers

x86 processors have a handful of debug registers that can be used to invoke an exception under certain conditions. You can find more information about how the debug registers work online, so I’ll just describe how I used them here. What I’m interested in is setting a write watch on a certain memory address, namely the location of SubSystemTib in the thread’s TIB. If I can find what code is doing the write to SubSystemTib, I’ll have more information about how to fix this bug.

I only want to set the debug register once, in the thread which will crash, before the bad write occurs. To accomplish this, I inserted the hack into dlls/ntdll/loader.c:alloc_module, which will be run very early in every new thread. From the debug log, I know that the crashing thread is also the first thread that the application creates, so I can easily use the thread ID to gate setting the register only on the main thread.

In Wine, the debug registers can be set with the NtSetContextThread function, which takes a CONTEXT struct containing the debug register values to set as an argument.

Putting all this together, here’s a diff to ntdll to set a write breakpoint on the address containing the SubSystemTib pointer:

--- a/dlls/ntdll/loader.c
+++ b/dlls/ntdll/loader.c
@@ -1353,6 +1353,26 @@ static WINE_MODREF *alloc_module( HMODULE hModule, const UNICODE_STRING *nt_name
     const WCHAR *p;
     const IMAGE_NT_HEADERS *nt = RtlImageNtHeader(hModule);
 
+#ifdef __x86_64__
+    static BOOL dbg_set = FALSE;
+    if (!dbg_set && GetCurrentThreadId() == GetCurrentProcessId() + 4)
+    {
+        CONTEXT context;
+
+        memset(&context, 0, sizeof(context));
+        context.ContextFlags = CONTEXT_DEBUG_REGISTERS;
+        context.Dr0 = (ULONG_PTR)&NtCurrentTeb()->Tib.SubSystemTib;
+        context.Dr7 =
+             3        | //enable Dr0 locally & globally
+            (1 << 16) | //enable write watch on Dr0
+            (2 << 18);  //watch 8 bytes (== sizeof(void*)) starting at value in Dr0
+
+        NtSetContextThread(GetCurrentThread(), &context);
+
+        dbg_set = TRUE;
+    }
+#endif
+
     if (!(wm = RtlAllocateHeap( GetProcessHeap(), HEAP_ZERO_MEMORY, sizeof(*wm) ))) return NULL;
 
     wm->ldr.DllBase       = hModule;

Now, when anything writes to the location that contains SubSystemTib, the CPU will cause a EXCEPTION_SINGLE_STEP exception.

If the exception is left unhandled, the application will crash. This is good enough for our purposes, since we just need to know where the write to this memory location occurs. But if you want the application to continue running instead of crashing on this write instruction, you can just ignore the exception in the signal handler:

--- a/dlls/ntdll/unix/signal_x86_64.c
+++ b/dlls/ntdll/unix/signal_x86_64.c
@@ -2951,6 +2951,12 @@ static void trap_handler( int signal, siginfo_t *siginfo, void *sigcontext )
     struct xcontext context;
     ucontext_t *ucontext = sigcontext;
 
+    if (siginfo->si_code == 4)
+    {
+        ERR("ignoring TRAP_HWBKPT exception at %p\n", siginfo->si_addr);
+        return;
+    }
+
     if (handle_syscall_trap( sigcontext )) return;
 
     rec.ExceptionAddress = (void *)RIP_sig(ucontext);

Running the game with these changes prints this to the log before it crashes:

0124:0128:err:unwind:trap_handler ignoring TRAP_HWBKPT exception at 0x38c00ba9

This tells us that the instruction that is writing to SubSystemTib is the one immediately before 0x38c00ba9. I used winedbg as above to discover that this code lives within the game’s main executable:

Wine-dbg>info share 0x38c00ba9
Module  Address                                 Debug info      Name
PE      0000000037000000-0000000039dae000       Deferred        Game

Finishing the bug

Now we’ve diagnosed the immediate cause of the crash (someone is writing bogus data into the TIB) and we’ve found the code that is doing the write (the application’s main executable), so it’s time to study what the application is trying to do.

Disassembling the application with objdump and jumping to 0x38c00ba9 gives:

$ objdump -d Game.exe
...
38c00b9b:   65 4c 8b 14 25 30 00 00 00    mov    %gs:0x30,%r10
38c00ba4:   58                            pop    %rax
38c00ba5:   49 89 42 18                   mov    %rax,0x18(%r10)
38c00ba9:   58                            pop    %rax

The first instruction loads the address of the TIB into r10 (the gs register contains the TIB pointer, and offset 0x30 within the TIB contains a pointer to itself). The next instruction pops 8 bytes off the stack into rax. The third instruction is the suspicious write that triggered the exception. It is writing those 8 bytes in rax to offset 0x18 of the TIB. This corresponds to SubSystemTib, as you can see in Wine’s definition of the TIB here:

typedef struct _NT_TIB {                                  // Offset:
    struct _EXCEPTION_REGISTRATION_RECORD *ExceptionList; // 0x0
    PVOID StackBase;                                      // 0x8
    PVOID StackLimit;                                     // 0x10
    PVOID SubSystemTib;                                   // 0x18
    ...
}

It’s clear from this analysis that the bug is not due to some bad pointer writing to random memory. The application is definitely writing directly to the TIB’s SubSystemTib field, for whatever reason. Given Wine’s usage of SubSystemTib is a hack, it’s likely that Windows just ignores this field and the game is storing some value there for some unknown internal purpose.

Regardless, the fix is simple. In Proton, we are only concerned with running games that ship with the Steam client, and it’s almost certain that there are no 16-bit applications shipped with Steam, as they won’t even run under modern Windows. So we can just remove Wine’s hack to redirect to the 16-bit TIB and instead always ignore the value contained in SubSystemTib:

--- a/dlls/ntdll/path.c
+++ b/dlls/ntdll/path.c
@@ -519,7 +519,7 @@ static ULONG get_full_path_helper(LPCWSTR name, LPWSTR buffer, ULONG size)
 
     RtlAcquirePebLock();
 
-    if (NtCurrentTeb()->Tib.SubSystemTib)  /* FIXME: hack */
+    if (0 && NtCurrentTeb()->Tib.SubSystemTib)  /* FIXME: hack */
         cd = &((WIN16_SUBSYSTEM_TIB *)NtCurrentTeb()->Tib.SubSystemTib)->curdir.DosPath;
     else
         cd = &NtCurrentTeb()->Peb->ProcessParameters->CurrentDirectory.DosPath;

This is exactly what I committed to Proton’s Wine fork to fix this game. A proper fix would involve rewriting how Wine handles 16-bit applications to acquire the correct TIB.

About Andrew Eikum
Andrew was a former Wine developer at CodeWeavers from 2009 to 2022. He worked on all parts of Wine, but specifically supported Wine's audio. He was also a developer on many of CodeWeavers's PortJumps.

The following comments are owned by whoever posted them. We are not responsible for them in any way.

Wow, this is a fascinating read! It really shows how complex our Wine work can be.

1

What a good read.

For reference,
https://bugs.winehq.org/show_bug.cgi?id=46022

Was this Age of Mythology: Extended Edition? 👀

Awesome detection work. But a fix like this can not be backported to the real Wine, so how does the community benefit from it, or paying CodeWeavers customers for that matter? Has anyone ever tried to just go over the Wine code and work on all the FIXMEs and TODOs instead of fixing bugs the hard way as they appear? 😉

I am assuming that a patch will land in a later development version. 7.0.x at a later date from where a new version of CrossOver will be built.

CodeWeavers or its third-party tools process personal data (e.g. browsing data or IP addresses) and use cookies or other identifiers, which are necessary for its functioning and required to achieve the purposes illustrated in our Privacy Policy. You accept the use of cookies or other identifiers by clicking the Acknowledge button.
Please Wait...
eyJjb3VudHJ5IjoiVVMiLCJsYW5nIjoiZW4iLCJjYXJ0IjowLCJ0enMiOi02LCJjZG4iOiJodHRwczpcL1wvbWVkaWEuY29kZXdlYXZlcnMuY29tXC9wdWJcL2Nyb3Nzb3Zlclwvd2Vic2l0ZSIsImNkbnRzIjoxNzM0NzIyMzMzLCJjc3JmX3Rva2VuIjoia254ZW15TnVTcXlQVHJCRSIsImdkcHIiOjB9