Last week, I was working on a crash bug in a game running in Proton.
About 90% of the time, the game would crash at some random point during
the publisher logo screens. Rarely, it would succeed and get to the
game’s main menu. That kind of inconsistent behavior points to some kind
of invalid memory bug.
While debugging this, I mentioned to my coworker Paul Gofman that it looked like a bogus write to an important part of memory. Paul suggested that I try setting a hardware write watch breakpoint to determine where the bad write was coming from. I hadn’t done that before, so Paul provided me with example code. I found it really useful, so here’s a walkthrough of how I debugged the issue and used the x86 debug registers to diagnose the problem and develop a fix.
Analyzing the crash
Here’s the fatal exception from a WINEDEBUG
log of the
game crashing:
0124:0128:trace:seh:dispatch_exception code=c0000005 flags=0 addr=00000001700680B4 ip=00000001700680B4 tid=0128
I then ran the game under Wine’s debugger, winedbg, to discover where the crash was occurring (output heavily trimmed to relevant sections):
[aeikum@aeikum ~]$ /tmp/proton_aeikum/winedbg_run
WineDbg starting on pid 00e0
0x0000000170057a59 ntdll+0x57a59: ret
Wine-dbg>c
...
Unhandled exception: page fault on read access to 0xffffffffffffffff in 64-bit code (0x00000001700440f8).
Wine-dbg>info share 0x1700440f8
Module Address Debug info Name
PE 0000000170000000-00000001700a1000 Export ntdll
This shows that it is crashing in Wine code, somewhere in ntdll. I
used some printf-debugging to discover exactly where the crash was
occurring. I narrowed it down to the function
get_full_path_helper
in dlls/ntdll/path.c
.
Specifically it’s crashing when dereferencing a bogus cd
pointer in this code:
case RELATIVE_DRIVE_PATH: /* c:foo */
dep = 2;
if (wcsnicmp( name, cd->Buffer, 2 ))
Here’s the bogus pointer value from one run (the value changes in every run):
0124:0128:err:file:get_full_path_helper cd: 6F72507865546E8F
That’s obviously garbage (it’s actually the ASCII string
"\x8FnTexPro"
). Where is this garbage coming from? Earlier
in the function, you can find where cd
is calculated:
if (NtCurrentTeb()->Tib.SubSystemTib) /* FIXME: hack */
cd = &((WIN16_SUBSYSTEM_TIB *)NtCurrentTeb()->Tib.SubSystemTib)->curdir.DosPath;
else
cd = &NtCurrentTeb()->Peb->ProcessParameters->CurrentDirectory.DosPath;
One more debug line to print the value of SubSystemTib shows the problem:
0124:0128:err:file:get_full_path_helper SubSystemTib: 6F72507865546E6F
As you can read from the Wine code above, a non-NULL SubSystemTib indicates that Wine is dealing with a 16-bit process, so it uses the 16-bit thread information block (TIB) structure instead of the modern Windows TIB. This is obviously not a 16-bit game, so the problem is that SubSystemTib is somehow being set to a non-NULL value. Also note that Wine’s code has a comment indicating that this usage of SubSystemTib is a hack, which means that Wine is probably treating this field differently from how Windows does.
The question is, who is setting SubSystemTib to this weird value? Wine only has one line of code that sets SubSystemTib, and it only occurs in 16-bit code, which is definitely not relevant. So it’s either application code writing to SubSystemTib intentionally, or some kind of bogus write like a bad pointer or a buffer overflow in either Wine or the application. We need to find out what code is doing the write to SubSystemTeb to understand how to fix the problem. To do that, I turned to the x86 hardware debug registers feature that Paul suggested.
x86 debug registers
x86 processors have a handful of debug registers that can be used to invoke an exception under certain conditions. You can find more information about how the debug registers work online, so I’ll just describe how I used them here. What I’m interested in is setting a write watch on a certain memory address, namely the location of SubSystemTib in the thread’s TIB. If I can find what code is doing the write to SubSystemTib, I’ll have more information about how to fix this bug.
I only want to set the debug register once, in the thread which will
crash, before the bad write occurs. To accomplish this, I inserted the
hack into dlls/ntdll/loader.c:alloc_module
, which will be
run very early in every new thread. From the debug log, I know that the
crashing thread is also the first thread that the application creates,
so I can easily use the thread ID to gate setting the register only on
the main thread.
In Wine, the debug registers can be set with the
NtSetContextThread
function, which takes a
CONTEXT
struct containing the debug register values to set
as an argument.
Putting all this together, here’s a diff to ntdll to set a write breakpoint on the address containing the SubSystemTib pointer:
--- a/dlls/ntdll/loader.c
+++ b/dlls/ntdll/loader.c
@@ -1353,6 +1353,26 @@ static WINE_MODREF *alloc_module( HMODULE hModule, const UNICODE_STRING *nt_name
const WCHAR *p;
const IMAGE_NT_HEADERS *nt = RtlImageNtHeader(hModule);
+#ifdef __x86_64__
+ static BOOL dbg_set = FALSE;
+ if (!dbg_set && GetCurrentThreadId() == GetCurrentProcessId() + 4)
+ {
+ CONTEXT context;
+
+ memset(&context, 0, sizeof(context));
+ context.ContextFlags = CONTEXT_DEBUG_REGISTERS;
+ context.Dr0 = (ULONG_PTR)&NtCurrentTeb()->Tib.SubSystemTib;
+ context.Dr7 =
+ 3 | //enable Dr0 locally & globally
+ (1 << 16) | //enable write watch on Dr0
+ (2 << 18); //watch 8 bytes (== sizeof(void*)) starting at value in Dr0
+
+ NtSetContextThread(GetCurrentThread(), &context);
+
+ dbg_set = TRUE;
+ }
+#endif
+
if (!(wm = RtlAllocateHeap( GetProcessHeap(), HEAP_ZERO_MEMORY, sizeof(*wm) ))) return NULL;
wm->ldr.DllBase = hModule;
Now, when anything writes to the location that contains SubSystemTib,
the CPU will cause a EXCEPTION_SINGLE_STEP
exception.
If the exception is left unhandled, the application will crash. This is good enough for our purposes, since we just need to know where the write to this memory location occurs. But if you want the application to continue running instead of crashing on this write instruction, you can just ignore the exception in the signal handler:
--- a/dlls/ntdll/unix/signal_x86_64.c
+++ b/dlls/ntdll/unix/signal_x86_64.c
@@ -2951,6 +2951,12 @@ static void trap_handler( int signal, siginfo_t *siginfo, void *sigcontext )
struct xcontext context;
ucontext_t *ucontext = sigcontext;
+ if (siginfo->si_code == 4)
+ {
+ ERR("ignoring TRAP_HWBKPT exception at %p\n", siginfo->si_addr);
+ return;
+ }
+
if (handle_syscall_trap( sigcontext )) return;
rec.ExceptionAddress = (void *)RIP_sig(ucontext);
Running the game with these changes prints this to the log before it crashes:
0124:0128:err:unwind:trap_handler ignoring TRAP_HWBKPT exception at 0x38c00ba9
This tells us that the instruction that is writing to SubSystemTib is
the one immediately before 0x38c00ba9
. I used winedbg as
above to discover that this code lives within the game’s main
executable:
Wine-dbg>info share 0x38c00ba9
Module Address Debug info Name
PE 0000000037000000-0000000039dae000 Deferred Game
Finishing the bug
Now we’ve diagnosed the immediate cause of the crash (someone is writing bogus data into the TIB) and we’ve found the code that is doing the write (the application’s main executable), so it’s time to study what the application is trying to do.
Disassembling the application with objdump and jumping to
0x38c00ba9
gives:
$ objdump -d Game.exe
...
38c00b9b: 65 4c 8b 14 25 30 00 00 00 mov %gs:0x30,%r10
38c00ba4: 58 pop %rax
38c00ba5: 49 89 42 18 mov %rax,0x18(%r10)
38c00ba9: 58 pop %rax
The first instruction loads the address of the TIB into
r10
(the
gs
register contains the TIB pointer, and offset 0x30
within the TIB contains a pointer to itself). The next instruction pops
8 bytes off the stack into rax
. The third instruction is
the suspicious write that triggered the exception. It is writing those 8
bytes in rax
to offset 0x18 of the TIB. This corresponds to
SubSystemTib, as you can see in Wine’s definition of the TIB here:
typedef struct _NT_TIB { // Offset:
struct _EXCEPTION_REGISTRATION_RECORD *ExceptionList; // 0x0
PVOID StackBase; // 0x8
PVOID StackLimit; // 0x10
PVOID SubSystemTib; // 0x18
...
}
It’s clear from this analysis that the bug is not due to some bad pointer writing to random memory. The application is definitely writing directly to the TIB’s SubSystemTib field, for whatever reason. Given Wine’s usage of SubSystemTib is a hack, it’s likely that Windows just ignores this field and the game is storing some value there for some unknown internal purpose.
Regardless, the fix is simple. In Proton, we are only concerned with running games that ship with the Steam client, and it’s almost certain that there are no 16-bit applications shipped with Steam, as they won’t even run under modern Windows. So we can just remove Wine’s hack to redirect to the 16-bit TIB and instead always ignore the value contained in SubSystemTib:
--- a/dlls/ntdll/path.c
+++ b/dlls/ntdll/path.c
@@ -519,7 +519,7 @@ static ULONG get_full_path_helper(LPCWSTR name, LPWSTR buffer, ULONG size)
RtlAcquirePebLock();
- if (NtCurrentTeb()->Tib.SubSystemTib) /* FIXME: hack */
+ if (0 && NtCurrentTeb()->Tib.SubSystemTib) /* FIXME: hack */
cd = &((WIN16_SUBSYSTEM_TIB *)NtCurrentTeb()->Tib.SubSystemTib)->curdir.DosPath;
else
cd = &NtCurrentTeb()->Peb->ProcessParameters->CurrentDirectory.DosPath;
This is exactly what I committed to Proton’s Wine fork to fix this game. A proper fix would involve rewriting how Wine handles 16-bit applications to acquire the correct TIB.
About Andrew Eikum
Andrew was a former Wine developer at CodeWeavers from 2009 to 2022. He worked on all parts of Wine, but specifically supported Wine's audio. He was also a developer on many of CodeWeavers's PortJumps.