Blog

banner-asset-med

Reverse Engineering on Windows Without Symbols or Source, Part Fun (One)

Foreword

Application penetration tests are often supported by the customer allowing the testing team access to source code, the product's build chain, or executables and libraries compiled with debugging symbols. These "white box" tests support a comprehensive test intending to maximize testing coverage by increasing testers' understanding of the application. Tests such as this may not always be practical or possible -- whether to maintain source code integrity, protect trade secrets, or conceal internal coding processes and their build chain. 

Another important reason these may be omitted from preparation or analysis during the performance of a test is encompassed by "grey box" or "black box" testing, in which testers are given no prior access to the application and efforts are not supported by internal information. This requires testers to more accurately mimic approaches that might be taken by adversarial hackers. It may narrow the coverage of a test, but it will provide a more realistic assessment of what an adversary might accomplish with similar manpower in a similar time. This is further complicated when approaching an application from the perspective of reverse engineering: not having access to source code or symbol information creates unique challenges, some of which I've decided to document solutions to here.

The main elements of reversing that will be presented from the gray or black box contexts here are:

0.     A quick overview of the tools that are going to be used.

1.     A brief review of calling conventions for C-like compiled languages (specifically focusing on how to identify the boundaries of a function by locating its prologs and epilogs).

2.  How to locate important functions in memory when the executable or library does not have debugging symbols.

3.     How to patch executables and libraries to modify application behavior.

To motivate these techniques being used in combination we're going to show examples of temporarily and permanently modifying the runtimes of OpenSSL and PuTTY and conclude with a walkthrough on how to patch the venerable Windows game Minesweeper so that the player never loses.

I hope you enjoy reading this; just remember that Scorpion Labs does not advocate you taking this information and using it to break open target software, find vulnerabilities, and claim your own CVEs.

Oh, who am I kidding -- we totally do (just don't go breaking the law in the process).

Tools and Approach

The approach here is to couple debugging with reverse engineering. On Linux systems, the classic reversing tool Ghidra -- which you know is trustworthy beyond question since it was developed at and released by the NSA -- can be coupled with the debugger gdb for relatively smooth integration. On Windows systems that process is not as seamless, presents its own set of challenges, and often requires its own workarounds. Here, Ghidra and WinDbg are used as separate but complimentary tools as the lessons to be learned are easily transferable to more stable environments.

An installation walkthrough, documentation, and tutorials on Ghidra are made available by the NSA [1] and the application can be downloaded as a release or compiled from source from GitHub [2]. The techniques presented here assume an installation of these tools and Sysinternals are already present, and the reader already has a basic understanding of each. It is highly recommended that the Window panel in Ghidra's Codebrowser includes the "Decompiler", "Defined Strings", and "Functions" views. These can be enabled by selecting the corresponding options from the "Window" drop-down menu on the top menu bar. These views make jumping between cross-references within the application much easier. In addition, make sure you're using a Dark-mode theme- I mean if you use anything else- what kind of monster are you? (From the primary Project window's drop-down menu and use "Edit" "Themes" "Switch" and select one of the dark modes.)

ghidrawindowpanel

Ghidra's Window panel shows "Defined Strings" and "Functions" alongside the usual decompilation. Note how much easier-on-the-eyes Ghidra is in dark mode.

The other tool directly referenced here is WinDbg, although OllyDbg and x64dbg can be used by carefully matching the commands and view. WinDbg can be downloaded and installed manually, or installed as an app from the Microsoft Store:

ratedeoverestimated

WinDbg is rated "E" for "Everyone". Microsoft is apparently as confident in WinDbg as my parents were in me when I was eleven years old and they got me that chemistry set.

 

I Just Came To Say "Hello" (... to a function)

A complete review of functional calling conventions -- the instructions implemented when an application moves from the execution context of one function to another called from within it -- won't be necessary to understand the techniques presented here (though it is important for other related disciplines like binary exploit development, so links to more complete descriptions are given in the Further Reading section after this post), but reviewing function prologs and epilogs will be. These are the actions taken by the application to enter and exit the execution context of a called function. Knowing how to recognize them can be useful in finding critical points to break on to study and modify functions' input and output, even when the runtime is dynamic enough to move around function addresses.

Each function in a compiled language has one prolog and potentially several epilogs, depending on how the function's logical flow branches internally. In 32-bit Windows systems, the prolog almost always follows the following pattern: The caller pushes the current base pointer onto the stack and saves the current stack pointer in the base pointer. Thus- the base pointer always points to the beginning of the current stack frame. Then the size of the stack is expanded by moving the stack pointer, essentially allocating space for the function's local variables. As the stack "grows down", this is done by subtracting a value -- fixed per function at compilation time -- from the stack pointer, creating a function preamble that would look something like the following:

        55          PUSH    EBP

        8b ec       MOV     ESP

        83 ec 78    SUB     ESP,0x78

Entering a function with a fixed stack size of 0x78 bytes.

The 83 ec 78 bytes indicate that the stack size allocated for local variables of this function is 0x78. Seeing this in the static disassembly and the dynamic runtime is a good example of where the entry point of a targeted function has been located. If the stack size is larger than 0x80 then a different instruction set will be used telling the processor to subtract an entire four-byte word from esp instead of just a byte, resulting in slightly different instructions:

        55          PUSH    EBP

        8b ec       MOV     ESP

        81 ec 94    SUB     ESP,0x194

        01 00 00

Subtracting an entire word from ESP.

When a Windows function implements structured exception handling (SEH) it will push the address of the exception list onto the stack, terminated by a -1. This creates a function prolog resembling the following:

        55          PUSH    EBP

        8b ec       MOV     ESP

        6a ff       PUSH    -0x1

        68 be ba    PUSH    0xCAFEBABE

        fe ca

An Exception List at 0xCAFEBABE being introduced in the function prolog.

In 32-bit Windows, arguments are passed on the stack, so they can be inspected by setting a breakpoint on the entry point (PUSH ESP) and inspecting the contents of the stack (dd esp), including following them as pointers read in little-endian when that breakpoint is hit. For this reason, it is useful to remember the signature of a function prolog, and disassemblers like IDA and Ghidra often use these alongside cross-references with CALL instructions as a locator of a function's entry point.

Before briefly discussing the 64-bit calling convention used by Microsoft, note that it differs slightly from the System V AMD64 ABI convention used by Unix-like systems such as Linux and MacOS. These differences largely concern stack alignment, which registers hold which arguments and how many, and Microsoft's allocation of 'shadow space' to store copies of arguments passed in registers. We specifically focus on Microsoft applications compiled for Intel's 32-bit and 64-bit architectures, but other calling conventions on this architecture and even architectures like ARM and MIPS follow similar conventions so adapting this understanding requires only minor adjustments. Here, we'll just refer to them as x86 (32-bit Windows) and x64 (64-bit Windows) for brevity, but keep that generalization to this article for now.

In x64 the first four arguments are passed in registers (rcx, rdx, r8, and r9, in that order) and additional registers are pushed onto the stack before the call. If the argument-carrying registers are to be used in the function, then their contents are saved to the shadow space on the stack in the prolog. Windows divides the remaining registers into two classes: volatile registers, whose contents may change during the execution of the function; and non-volatile registers, whose contents are expected to be restored when the function returns. The following function prolog shows three parameters saved into the shadow space, followed by five registers copied onto the stack before the base pointer and stack pointer are adjusted:

        4c 89 4c    MOV     dword ptr [RSP + 0x20],R9

        24 20    

        44 89 44    MOV     dword ptr [RSP + 0x18],R8

        24 18

        89 54 24    MOV     dword ptr [RSP + 0x10],RDX

        10       

        55          PUSH    RBP

        53          PUSH    RBX

        56          PUSH    RSI

        57          PUSH    RDI

        41 56       PUSH    R14

        48 8b ec    MOV     RBP,RSP

        58 83 ec    SUB     RSP,0x40

        40

The above function prolog, from notepad.exe on Windows 11, shows how much variation there can be in x64 function prologs.

The added complexity involved in register preservation, parameter storage, and stack alignment creates a variation in function prologs that results in a high degree of uniqueness. Knowing from the disassembly and decompilation which bytes correspond to the prolog of which function is essential to locate its entry point when an application's runtime does not place it in the same location as its static decompilation shows.

... and Now To Say "Goodbye" (again, to a function)

Just as important as being able to locate function prologs to study their entry conditions is locating function epilogs to identify the program state when the function returns. Unlike function prologs, a function can have many epilogs as branching logic may bring it to several different conclusions. Luckily, these are generally simpler and easier to locate; often they can be found by counting the signatures between them and the next function prolog.

In x86 the prolog trims stored argument parameters off of the stack, deallocates the space reserved in the prolog by dropping the stack pointer to the current base pointer, restores the base pointer to its value before the prolog by popping it off of the stack, and returns to the calling function by redirecting execution to the address now at the top of the stack, usually placed there by a CALL instruction. (Overwriting this saved address to redirect execution is at the heart of "stack smashing" attacks.) This results in an epilog that often looks like:

        89 ec       MOV     ESP,EBP

        5d          POP     EBP

        c3          RET

A typical x86 function epilog may be preceded by a series of pop instructions to trim stored arguments from the stack. Instruction cb might be used for RET if the return address is in a different code segment.

Some compilers may support the "leave" instruction, which combines the "move" and "pop" instructions into a single step, resulting in the following epilog:

        c9          LEAVE

        c3          RET

The leave instruction does the same thing as the MOV and POP in just one byte.

Function epilogs in x64 applications differ slightly from their x86 counterparts, but not as much as with prologs. In an x64 function epilog, three things will typically happen:

-    An add RSP,fixed-stack-size or lea RSP,fixed-stack-size[FPReg] where FPReg is the register containing the frame  pointer, to deallocate the stack space claimed for the function,

-     a possibly zero-length sequence of pop instructions to restore the values of non-volatile registers that were saved on the stack in the function prolog, then

-        a ret or jmp instruction to redirect execution -- the latter being much less common.

A typical x64 function epilog which does not require a frame pointer may then look like the following:

        48 83 c4    ADD     RSP,0x28

        28

        5b          POP     RBX

        5e          POP     RSI

        5f          POP     RDI

        5d          POP     RBP

        c3          RET

An x64 function epilog taken from openssl.exe, compiled for and packaged with Git for Windows.

No Symbols? No Problem

One challenge faced when black box reversing or debugging is that it isn't exactly straightforward to figure out how to tell the debugger where to break when entering or leaving an interesting function as symbols are unlikely to be involved. On a recent engagement, Scorpion Labs had to assess the security of an application that used both client and server SSL certificates for authentication and encryption. Valid client certificates were provided by the customer and upstream server resources were out-of-scope, but the behavior of the client was an objective of the test and provided arbitrary input had to bypass the certificate pinning provided through OpenSSL methods compiled into the client application.

Functions will typically return values in one of two ways: changes to structs or objects passed in by reference that remain in memory after the function is complete, or a return value. In both x86 and x64 architectures, the return value is carried in the eax/rax register and will typically be set shortly before the function epilog. This means that during interactive debugging sessions setting a breakpoint on a function's epilog(s) can allow a researcher to control the behavior of the returning function, regardless of its input. In certain situations, the locations where the return value is set can be modified in the executable or library's on-disk binary footprint to permanently alter application behavior.

In bypassing SSL validation this took the form of controlling the behavior of the ssl_verify_cert_chain function sourced from openssl/ssl/ssl_cert.c in OpenSSL's source code [3]. Because those methods became application-specific to that client and test, here I'll present a parallel outcome that can be achieved through the same techniques: bypassing Host Key validation in the esteemed Windows SSH client PuTTY. The following analysis was done with PuTTY Release 0.82, MSC_VER=1932, MSC_FULL_VER=193231329. Addresses in your decompilation may vary from those shown here but following along with this analysis will still produce the correct ones for your installation.

Of course, to control the output of the function we first need to find where it is in the application. PuTTY does not compile OpenSSH functions into it directly, but it does use a code base that is strongly reminiscent of it, so OpenSSH can be our starting point. In OpenSSH the string "The authenticity of host ..." that appears in the function check_host_key of libopenssh/ssh/sshconnect.c is used to warn the user that the remote server's host key couldn't be correlated to any stored locally [ref]. Searching PuTTY's defined strings in Ghidra for strings with similar meaning eventually lands on "Host key not in manually configured list ...", referenced in (x86) PuTTY's FUN_0044b7c0. This function's declaration (as reconstructed by Ghidra) appears similar to check_host_key (although its stack does not) and the strings that appear later in the function ("... either the server administrator has changed the host key ..." or "... or you have actually connected to another computer pretending to be the server") strongly suggest that this function controls host key checking. (In fact, it is do_ssh1_login  in PuTTY's source code base [4]).

Following the decompilation of this function shows that branching logic within this function depends strongly on the local parameters cVar2/iVar3, and that if the call to FUN_00430a50 returns the value of 0 then this function jumps directly to its epilog.

hostkeyscritical_inclusion

The disassembly above shows that if FUN_00430a50 returns 0 (from cVar2 to iVar3) then FUN_0044b7c0 jumps directly to its return. (The conditional jump lands on the instruction mov eax,ebx, so the return value of FUN_00430a50 is not returned by FUN_0044b7c0 to its own caller.)

We can test whether FUN_00430a50 is the function that verifies host keys by testing it dynamically and controlling its return value, and we can do this without debugging symbols or source code. First, we must locate it definitively in a debugging runtime. Here are a few ways how:

1.  Find the function address in the decompilation and calculate its location in the runtime environment.

When an executable is analyzed in Ghidra, the base address is determined either from header information in the file itself, or from the executable format and known defaults. For instance: x86 applications generally default to a base address of 0x00400000 and x64 applications to a base address of 0x140000000. This means that if Ghidra identifies a function as FUN_00430a50 and that function's base address is 0x00400000 then the function will be at binary offset 0x30a50 in the on-disk binary image. (In Ghidra the Image Base can be read at any time by selecting "Memory Map" from the "Window" drop-down menu on the top menu bar.)

When a program is executed, the operating system may not be required to respect the default base address (or the one specified in the image header). Since the binary is often loaded directly into the memory of the running program, finding a function may just be some simple arithmetic: look for the image base that the debugger shows for the executable as highlighted below:

modload_inclusion

The image base addresses in the x64 (frame, gold) and x86 (cutaway, blue) runtimes are not the same as the defaults seen in Ghidra but are offsets to the binary image of the executable in program memory.

Find the offset in the debugging runtime by subtracting the address of the function from its base address (both as labeled in Ghidra) and add it to the image address as reported by the debugger:

long_search_disas_inclusion

FUN_01400589f0 appears at 0x00007ff71e9f89f0 in the debugging runtime of PuTTY (x64). This is consistent with the previous figure showing the image base address at 0x00007ff71e9a0000 and Ghidra assuming it would be loaded at 0x140000000.

2.       Find the function address by locating unique byte sequences and rewinding to its prolog.

The method in (1) should be foolproof, but it may not always work. For example: the decompiler in Ghidra may not correctly reconstruct the function (it has been known to have trouble with binaries compiled using MinGW). In this case, it may be possible to locate a target function's entry point from the debugging runtime alone.

Returning to the x86 executable, recall that FUN_00430a50 was identified by a reference to the string "Host key not in manually configured list ..." in the strings. Launching the program in a debugging context this string can be searched for explicitly:

0:000> s -a 00a60000 00b77000 "Host key not in"

00b3fcc5  48 6f 73 74 20 6b 65 79-20 6e 6f 74 20 69 6e 20  Host key not in

It is unlikely that a compiler will place the same defined string in a binary twice. It is more likely that the defined string will be placed once, and each use of it does it by way of addressing this string.

This search doesn't return an address related to the function that uses this string. Instead, it returns the address of the string itself. Searching again for this address (with respect for endianness!) shows the code locations that reference the string directly:

# 00b3fcc5 (little endian) -> c5 fc b3 00
0:000> s 00a60000 00b77000 c5 fc b3 00

00aab994  c5 fc 5c 00 53 e8 c2 17-ff ff 83 c4 08 e9 ac 04  ..\.S...........

Searching for the address of the string, respecting the little-endian nature of the x86 and x64 architectures.

If searching for a direct reference to the string returns no results, then either find another string or unique sequence of bytes or skip to method (3). If many functions reference the target string, then the search may return more than one code location- proceed with this method for each result.

Lastly, use WinDbg to search backward for the function prolog. In this case, even searching backward 0x200 bytes for the push ebp instruction shows only two candidates for a function prolog:

0:000> s 0053b994 L-200 55

00aab7c0  55 53 57 56 83 ec 20 8b-5c 24 34 8b 74 24 58 8b  USWV.. .\$4.t$X.

00aab8d7  55 ff 51 1c 83 c4 08 89-34 24 8b 76 08 83 c6 02  U.Q.....4$.v....

Searching backward for locations that push the frame pointer is saved to the stack. This can happen for any number of reasons but careful inspection narrows down which of these are candidate function prologs.

If no results are found, then expand the window of the search. If multiple results are found, then examine each carefully for which can be a function prolog. In the example above, the block at 0x00aab7c0 is the more likely function prolog: after pushing the frame pointer onto the stack (push ebp, 55) several registers are saved (push ebx, 53; push edi, 57; push esi, 56) and the stack pointer is adjusted to allocate space for the functionÕs local parameters (sub esp,0x20; 8e ec 20). Arithmetic confirms for us that this is the start of FUN_0044b7c0:

0xaab7c0 - 0x00a60000 + 0x00400000 = 0x0044b7c0

3.       Search for the function prolog explicitly.

The method in (2) might also not work. Comparing the instruction that loads the string "Host key not in manually configured list ..." between the x86 it and x64 PuTTY binary shows an important difference:

        68 c5 fc    PUSH    s_Host_key_not_in_manually_...

        4d 00

The instruction that loads the string "Host key not in manually ..." in PuTTY (x86).

        48 8d 15    LEA     rdx,[rip+0xacb20]

        20 cb 0a

        00

The instruction that loads the string "Host key not in manually ..." in PuTTY (x64).

The functions are compiled this way to respect both the calling conventions and minimize the size of the compiled binary: the 32-bit PUSH instruction pushes the direct four-byte address of the target string onto the stack; instead of compiling a complete eight-byte address into the binary, the 64-bit LEA ("load effective address") instruction loads an address a distance no greater than four bytes in advance of the current instruction pointer directly into an argument register. This optimization means that a direct reference to the string, used in Method (2), won't appear in the function, so WinDbg's search function won't locate it.

Luckily, the x64 calling convention causes function prologs and epilogs to differ much more than their x86 counterparts on account of stack management, the handling of volatile and involatile registers, and the management of arguments between the two. This means that it can often be sufficient to search for a target function's entire prolog directly in the debugger to find its runtime entry point:

0:000> s 00007ff7`1e9a0000 00007ff7`1eae5000 41 57 41 56 41 55 41 54 56 57 55 53 48 81 ec 98 00 00 00 4c 89 cd

00007ff7`1e9f89f0  41 57 41 56 41 55 41 54-56 57 55 53 48 81 ec 98  AWAVAUATVWUSH...

0:000> u 00007ff7`1e9f89f0 L10

putty+0x589f0:

00007ff7`1e9f89f0 4157            push    r15

00007ff7`1e9f89f2 4156            push    r14

00007ff7`1e9f89f4 4155            push    r13

00007ff7`1e9f89f6 4154            push    r12

00007ff7`1e9f89f8 56              push    rsi

00007ff7`1e9f89f9 57              push    rdi

00007ff7`1e9f89fa 55              push    rbp

00007ff7`1e9f89fb 53              push    rbx

00007ff7`1e9f89fc 4881ec98000000  sub     rsp,98h

00007ff7`1e9f8a03 4c89cd          mov     rbp,r9

00007ff7`1e9f8a06 4d89c5          mov     r13,r8

00007ff7`1e9f8a09 4989d4          mov     r12,rdx

00007ff7`1e9f8a0c 4889cb          mov     rbx,rcx

00007ff7`1e9f8a0f 488bbc2428010000 mov     rdi,qword ptr [rsp+128h]

00007ff7`1e9f8a17 4c8bb42408010000 mov     r14,qword ptr [rsp+108h]

00007ff7`1e9f8a1f 488b051a360d00  mov     rax,qword ptr [putty+0x12c040

(00007ff7`1eacc040)]

Searching by bytes for the prolog of FUN_1400589f0 in PuTTY (x64) returns a single, exact result. Searching for three fewer bytes returns eight.

This runs the risk of finding repeats of the prolog, especially when searching through large applications for functions with simple prototypes that don't require much stack allocation, so each "hit" found this way in WinDbg should have its unassembly compared to its decompilation in Ghidra to ensure accuracy.

Anyway, here's the fun part: having found a way to reliably locate FUN_00430a50 (x86) and FUN_140048700 (x64) how do we prove that this function controls host key validation? Simple: dynamically. Start PuTTY under WinDbg and locate the epilog for FUN_00430a50. (This should be simple as there is only one; in my address space it is at 0x00430b25 in Ghidra's decompilation and 0x00a90b25 in the debugging runtime.) With the program paused at its entry point apply the following conditional breakpoint:

        bp 00a90b25 ".if (@eax != 0) {r eax = 0; gc} .else {gc}"

Conditionally break only if the return register is not zero; in this case set it to zero and continue.

This conditional breakpoint will stop execution if the return value is not zero, set it to zero, and resume. Initiate a connection to any SSH server you want; there will be no popups concerning the SSH host keys. (It's actually a bit cooler than that -- there's a different branch in the logic of FUN_0044b7c0 that checks whether or not the remote host key matches a cached one, but the program execution never reaches it because the conditional jump after FUN_00430a50 returns essentially tells the program "the host key wasn't found at all" and skips over that branch.)

... Finally, Here Comes the Tough Fun Part!

Functions will typically return values in one of two ways: changes to structs or objects passed in by reference that remain in memory after the function is complete, or a return value. In both x86 and x64, the return value is carried in the eax/rax register and will typically be set shortly before the function epilog. This means that during interactive debugging sessions setting a breakpoint on a function's epilog can allow a researcher to control the behavior of the calling function, regardless of its input. In certain situations, the locations where the return value is set can be modified in the executable or library's on-disk binary footprint to permanently alter application behavior.

In the previous example, SSH host key validation in PuTTY was bypassed in a debugging environment by ensuring the validation function always returns safely. To make this change permanent this change can be made in the on-disk binary itself. The epilog value for FUN_00430a50 is set here:

...

00430b19    83 c4 04    ADD     ESP,0x4

00430b1c    89 d8       MOV     EAX,EBX

00430b1e    83 c4 10    ADD     ESP,0x10

00430b21    5e          POP     ESI

00430b22    5f          POP     EDI

00430b23    5b          POP     EBX

00430b24    5d          POP     EBP

00430b25    c3          RET

The epilog for FUN_00430a50 in PuTTY (x86) setting the ÒreturnÓ value at 0x00430b1c.

This uses 89 c8 (mov eax,ebx) to set the return value to a local parameter stored in ebx. This presents a challenge because when modifying a compiled executable or library your modifications must match the size of the overwritten segment exactly: unbalanced modifications will offset data and function addresses in the binary that are determined at compile time, destroying its functionality. Any instruction substitution that is too short can be padded on either side with NOPs (opcode 90) to match the size of the substitution target, but a set of instructions longer than the substitution targets will require some creativity. Luckily- necessity is the mother of invention and creation its offspring.

Setting eax from another register will only ever be two bytes. If the calling function interprets NULL or 0 as success then if eax is set from another register the bytes that set it can be overwritten by the bytes seen above which empty the register. If eax is set by more than two bytes this can be padded with 90 (nop) to keep the size of the function and binary unchanged. Moving a 32-bit number or a 64-bit number directly into a register requires at least five bytes, so if a function expects 0 to be returned in case of success then the bytes 31 c0 (xor eax,eax) ensures that the function will always return zero. In the instance above, stomping the bytes 89 d8 with 31 c0 at offset 0x30b1c into the putty.exe will ensure no host key validation ever occurs. This does invalidate its signature, but the Windows tool signtool.exe, packaged with Windows' SDK, can be used to remove it:

signtool.exe remove /s "c:\Program Files (x86)\path\to\putty.exe"

Either remove the signature entirely and hope for the best or become your own certificate authority and re-sign it.

Unless your Windows Defender/EDR settings are paranoid, the binary will run anyway. Alternatively, I have a Gist on GitHub for a laboratory-environment solution to become your local own certificate authority and bypass this entirely [5].

How to Win At Minesweeper Without Really Trying

I had to debate with myself whether to include Minesweeper as an example of application modifications as it's written for the Win32 API that shipped with Windows 3.1 and became mainstream with Windows 95. In short: it is archaic, and when decompiled it doesn't remotely resemble programs that would result from having been written, compiled, and linked under more modern frameworks like .NET. For example: variables are stored statically in pre-allocated writable locations of process memory. Not only does Minesweeper not allocate a heap -- there's no indication that Minesweeper imports an allocator at all! Nevertheless, it was promised in the Foreword, so here goes.

indymeme

Reversing Minesweeper felt a lot like this, just with bits.

All the function names (FUN_0x100XXXX) and data addresses (DAT_0x0100XXXX) in this section are reflective of the version of Minesweeper that I used to perform this analysis [6]. Win32 applications don't change much, and the peak of Mount Everest has changed more recently than Minesweeper [7], so Ghidra should analyze any Minesweeper executable nearly the same way. If you find you have a copy with differing addresses, just follow the strategies from earlier in this article to locate them; it should be very simple to put differing versions in direct correspondence.

First a quick refresher on Minesweeper: an x-by-y grid of covered tiles is laid out on a playing field containing z mines. The player makes a move by selecting a tile, which is uncovered according to the following rules:

1.       Uncovering a tile that is adjacent to mines reveals the number of mined tiles adjacent to that just uncovered.

2.    Uncovering a tile that has no mines adjacent to it uncovers each adjacent tile; this process repeats for each tile uncovered  this way until all uncovered tiles are adjacent to at least one mine or are on the end of the grid.

3.     Uncovering a mined tile ends the game in a loss.

The objective is to use the layout of the field and the adjacency of mines to uncovered tiles to deduce the locations of the mines. The game ends in success when the number of uncovered tiles equals the number of mines. (I can't believe I had to put in this recap, but one of my reviewers had never wasted countless office hours playing played Minesweeper, so I guess after this is published, I'm just going to crawl into a hole with my ColecoVision and let the seasons end.) The strategy to be used here is to patch Rule 3 out of the game. Knowing Rule 2 will help to find the function that implements Rule 3.

rules_included

Rules 1 (left), 2 (center), and 3 (right) as seen in action, in-game. Were one playing by the rules, the losing move seen in the rightmost frame could have been avoided by taking into account the 1 on the tile to the bottom-left of the mined tile.

Because much of the process of developing this strategy involved the classical reverse engineering of the game and the point of this article was exercising the tricks mentioned within, Illl spare you the in-depth reversing details in favor of a high-level overview of how the game works internally and the general strategy behind tampering with it instead. (For extra credit you can use the roadmap sketched here to repeat the entire analysis.) First, play two games of Minesweeper, one resulting in a victory and the other resulting in a loss. Take a memory dump from each (a minidump from Sysinternals' procdump is enough) and open them in WinDbg to inspect the stack traces:

stacktraceloss_inclusion

With no popups or Win32 API calls, the stack trace due to a loss is relatively straightforward.

stacktracesuccess_inclusion

This memory dump was taken the moment the dialog box that notifies the winner that they had the "Fastest time" pops up asking the player to enter their name. The dialog box is not shown but stack trace should make it apparent.

Both stack traces indicate that the application operates primarily inside of a Microsoft Windows "message loop". In this paradigm, a window class object of type WNDCLASSEX is created and populated with various window details such as size, location, icons, and a "window procedure," which is a callback function that is activated when system messages of type MSG are sent to the window. The program registers this window with the RegisterClassEx function and then typically enters a loop around three functions:

-       GetMessage, which looks at an existing system queue of messages that generally track input/output to the program: such as keystrokes, cursor clicks, and system clock ticks, (the MSG object contains the Window ID that was active when the input was generated),

-       TranslateMessage, which is optionally called to convert system input/output into an application-acceptable input such as by translating hardware keystroke input across the keyboard keymap to generate the virtual corresponding character, and

-     DispatchMessage, which activates the window procedure that was registered with the window identified in the message, taking the current message as input.

Looking at the call stacks pictured above, GetMessage is called from WINMINE+0x23ad and DispatchMessageW is called from WINMINE+0x23a4. (Here, WINMINE refers to the base address of the loaded executable, which in this case is the default for Win32 applications: 0x01000000.) Ghidra's decompilation of Minesweeper shows these addresses to be in a function labeled FUN_010021f0, which shows the message loop clearly:

messageloop

FUN_010021f0 demonstrates the classic "message-loop-over-input", a programming paradigm of many Win32 applications. In Minesweeper messages are taken from both user input and system clock ticks.

Lines 25-35 show the creation and registration of the window that the Minesweeper game is played in, with the window procedure being set to FUN_01001bc9 on Line 26. Lines 56-62 are the message loop that populate local_28 with the next message in the queue and dispatch it to the window displaying Minesweeper to the user.

At this point the reversing of the program logic can be accomplished entirely by completing the reversing of FUN_01001bc9 and thorough protocol analysis from setting a breakpoint on DispatchMessageW and inspecting the MSG parameter given a variety of user-specified inputs. This is a marathon, not a sprint, through the maelstrom of tedium that is reverse engineering, and its entirety is left as an exercise to the interested reader. Both approaches can be tricky because system clock updates send messages in this queue alongside user input. I'll just jump to the conclusions.

The first thing that FUN_01001bc9 does is identify the message type and jump to the segment of code that handles it:

FUN_01001bc9-a

The opening of FUN_01001bc9 shows an "if-else" cascade that branches based on Windows message type.

The values passed in param_2 are message identifiers, the meaning of which can be looked up in the Win32 API documentation. If a breakpoint is set on the entry point of FUN_01001bc9 then param_2 can be read from the stack or from esi where it is still temporarily resident. If a breakpoint is set here and a tile is clicked on in the game's field of play, then this function will eventually be called several times with a sequence that resembles the following:

-         Message 0x0201: WM_LBUTTONDOWN
Tells the window that the cursor was left-clicked and where; upon continuation that tile will be impressed.

-         Message 0x0202: WM_LBUTTONUP
Tells the window that the left-clicked cursor was released and the location of the cursor when it was released.

-         Message 0x0215: WM_CAPTURECHANGED
Tells the window that it no longer has control over the cursor context (i.e. the application no longer needs to track the cursor, and it can be dragged off the window without consequence).

If the cursor is dragged while the left-click is engaged, then Message 0x0200: WM_MOUSEMOVE may appear between Messages 0x0201 and 0x0202 -- this is Minesweeper allowing the player to click on one tile and continue gameplay action by releasing the click on another tile. Since gameplay activity happens when the cursor is released, continue through the part of FUN_01001bc9 that handles Message 0x0202:

FUN_0100bc9-2_inclusion

The conditional jumps in the handling of message type 0x0202 suggest that non-default behavior because of this message occurs in FUN_010037e1.

Message 0x0202 does one of two things depending on the value in DAT_01005140: it either jumps directly to switchD_01001f75_caseD_203, which calls the default message handler DefWindowProcW and returns, or it sets DAT_01005140 to 0, releases cursor focus with ReleaseCapture, and conditionally calls FUN_010037e1 before calling the default handler and returning. Since the only non-default behavior concerning releasing the cursor concerns FUN_010037e1, letÕs take a better look at it:

FUN_010037e1

FUN_010037e1 handles the gameplay logic surrounding the release of the cursor on the field of play.

Understanding this function entirely through static reverse engineering can be daunting: it has no internal variable or function names, obviously no comments, and parameters are read from segments of the data section that are written elsewhere. A better approach would be to combine static and dynamic analysis by using the function decompilation to identify critical locations for breakpoints, then play the game in a debugging context to see when they are hit. This analysis will ultimately conclude that any valid move in the field of play -- even a losing move that strikes a mine -- will call FUN_01003512. Let's take a closer look:

FUN_01003512_inclusion

FUN_01003512 controls the logic concerning whether tiles are mined, adjacent-to, or not-adjacent-to mines.

There are a few things about this function that tell us that we're "almost there". For one thing: the arguments to this function -- called only when the cursor is released -- contain the coordinates of the selected grid square:

FUN_01003512_breakpoint_bak

The first word on the stack is the return address and the two following words -- 0x00000008 and 0x00000003 -- indicate the impressed grid square that is being released, counted one-up. DAT_01005340 contains information about the mine and adjacency layout of the field of play.

For another thing: FUN_01003084, called within the if block that opens this function, contains a do/while loop. Placing a breakpoint inside this function demonstrates that it is only encountered if you click and release and release the cursor over a square that has no adjacent mines. Continuing across this breakpoint shows the uncovering behavior on slow-motion:

FUN_01003804_inclusion

Each step in the animation above represents re-reaching the breakpoint at 0x010030b4 in the do/while loop of FUN_01003084, demonstrating the automatic uncovering of safe tiles following Rule (2).

Lastly, a breakpoint set on FUN_01002eab, called towards the end of FUN_01003084, in a different block of the if/else statement than FUN_01003084 will not hit if the player completes the game without striking a mine. My first inclination was to use a binary editor to modify the following lines so that Line 42 sets iVar = 1 instead of iVar = 0:

...

36        iVar1 = iVar1 + 1;

37        puVar3 = puVar3 + 0x20;

38      } while (iVar1 < DAT_01005338);

39      return;

40    }

41    FUN_01002eab(param_1,param_2,0x4c);

42    iVar1 = 0;

43  }

...

Got the right idea, but not quite on the right track.

This turned out not to be the correct thing to do as the modification caused striking the mine to send the program into an endless loop. A better idea was to look at the logic that caused the game to resolve the uncovering of a tile as a "safe" play:

...

 9  if (((&DAT_01005340)[param_1 + param_2 * 0x20] & 0x80 == 0) {

10    FUN_01003084(param_1,param_2);

11    if (DAT_010057a4 != DAT_010057a0) {

12      return;

13    }

14    iVar1 = 1;

15  }

...

In the above decompilation param_1 and param_2 represent the coordinates of the selected grid square. This if statement queries an array at DAT_01005340 for the status of the selected tile; 0x80 must indicate ÒminedÓ.

We know the gameplay logic here results from a "safe" move because the dynamic test showed it can activate a loop in FUN_01003084 to uncover safe tiles. Therefore, perhaps all we need to do is make a modification that makes sure that the left-hand side of the above always equals zero to guarantee that any move will be forced into this block of the if/else statement? Fortunately, the & symbol makes this very easy for us: as y & 0x00 = 0 for any y, a single modification that turns that 0x80 into 0x00 is enough to guarantee this. The location to modify can be found easily from the corresponding machine instructions:

...

01003512    8b 44 24    MOV     EAX,dword ptr [ESP + param_2]

01003516    53          PUSH    EBX

01003517    55          PUSH    EBP

01003518    56          PUSH    ESI

01003519    8b 74 24    MOV     ESI,dword ptr [ESP + param_1]

            10

0100351d    8b c8       MOV     ECX,EAX

0100351f    c1 e1 05    SHL     ECX,0x5

01003522    8d 94 31    LEA     EDX,[ECX + ESI*0x1 + DAT_01005340]

            40 53 00

            01

01003529    f6 02 80    TEST    byte ptr [EDX],0x80

0100352c    57          PUSH    EDI

0100352d    74 66       JZ      LAB_01003595

...

This TEST instruction compares the value of (&DAT_01005340)[param_1 + param_2 * 0x20] with 0x80. The conditional jump two instructions later is taken if they are equal.

The change is as simple as changing the 80 byte in the instruction highlighted above to 00. Ghidra adds the base address specified in the executable or library's image header as an offset to the addresses shown in its display of the instruction code or uses the architecture and platform's default. For Win32 applications this is 0x01000000. Modern runtimes do not need to respect this, and the base image address is often randomized due to address space layout randomization (ASLR) and position-independent execution (PIE), but Minesweeper, along with most other Win32 applications, predates both security features. By simply counting the 80 byte is 352b bytes into the binary. Replacing this byte with 00 and rerunning shows that we've accomplished our goal:

success_resized_inclusion

Have you ever had one of those games of Minesweeper where the mines are spaced in such a way that you click one grid square and it solves the entire game immediately? That must be how Feynman felt when he was showing off safecracking techniques to his coworkers and accidentally guessed the correct combination.

In the playthrough depicted above several tiles are uncovered that the auto-uncover action of FUN_01003084 indicate should very clearly be covering mines. Instead, a number is revealed that counts the number of mines adjacent to that tile, adding 1 for the tile itself. Minesweeper's 'win condition' is to consider an attempt successful when the number of tiles that remain covered equals the number of mines that were placed at the beginning of the attempt. Because the "struck a mine" logic in this example has been bypassed the game ends in victory when ten tiles remain covered, at which point Minesweeper marks all the remaining covered tiles as tiles containing mines and, quixotically, ignores the remaining "safe" tiles and the tiles that were uncovered but should have revealed mines.

Cheating at a Win32 game built for an operating system released in 1992 may not seem like much, but methods such as these were once used to "crack" video games when the technological protection measures (TPMs) were as simple as making sure that the installation media was accessible in the drive from which it was installed. These days more complicated TPMs exist -- such as ensuring that an installed application is always in communication with a licensing server or using a file hash to decrypt machine code corresponding to core functionality -- which can render these methods difficult if not completely ineffective for this purpose. Regardless, they can sometimes be used to bypass security instrumentation in the application that prevents its analysis, as Scorpion Labs did with OpenSSL primitives as described earlier.

It should be noted that just as cracked games from unreliable sources can be used to distribute malware, tampering with application binaries in the ways described here can disable protections within the binary. For instance: one of the things that is often required is removing a digital certificate as modifying an executable will change its file hash, rendering an existing certificate invalid. It is not recommended to use any programs that have been tampered with outside of dedicated settings isolated from sensitive data.

Bonus: Tracing Child Processes Across Architectures

One rare circumstance our team encountered during a test recently was how to trace a 32-bit process that spawns a 64-bit subprocess. The usual WinDbg option to trace child processes causes unpredictable behavior here because a 32-bit context cannot continue tracing into a 64-bit context. The workaround we used to bypass this was to launch a new debugging context as soon as the subprocess was launched. Here's how to do it:

1.     Ensure that the full path to the binary is in the PATH variable of both the system and the debugging user's environment.

2.   Modify the HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Image File Execution Options registry key in the following way:

-        Create a new entry with the filename of the subprocess (if there isnÕt one already here).

-        Add a string value (type REG_SZ) to this key named DEBUGGER. Set the data to be the full path to the WinDbg binary.

Now whenever the targeted process spawns it will appear in a new debugging context.

The DEBUGGER command specified in Step 2 will take command-line arguments. If you wanted to have the subprocess continue upon launch and break when it creates the first thread, you could set the data to the registry value to be C:\full\path\to\windbg.exe -c "bp NtCreateThreadEx; bp CreateRemoteThread; g". Entire scripts can be specified this way to control the post-launch behavior of a subprocess when trapped by a debugger. (This was especially important during our recent test, in which application behavior was strongly affected by synchronicity in inter-process communication between subprocesses.)

Exercise caution when modifying Image File Execution Options and be sure to do so carefully: control of pre- and post-launch behavior has been used by attackers as a malware persistence method [8].

Conclusions

A brief note should be made about the effect some of these techniques have on digital signatures. While tampering with the process memory of a running executable or loaded library does not affect its digital signature, tampering with the on-disk binaries can invalidate any digital signatures attached to them. I've written a GitHub gist on how to create your own root certificate authority, create intermediate certificate authorities with privileges to sign code signing or key encipherment/client-server authentication certificates, and use them to sign your binaries or server certificates to man-in-the-middle web traffic [ref]. When it comes to code signing this just trades an invalid certificate for an untrusted one unless the root certificate authority is added to the trusted certificate store of the device in use, but this work is all intended to be done in a testing environment anyway.

This post wasn't meant to be a guided introduction to reverse engineering- it started as a set of notes kept for sharing among teammates so we wouldn't have to re-learn these things every time we took an application test that was heavily reversing-oriented. I hope that this will be the first in a series of blog posts that build into a Swiss Army Knife of tips on reversing across Windows- and Unix-based applications. Still, to varying levels of difficulty, the skills contained herein alone allow you to do important things like disabling application security instrumentation, bypassing certain counter-reverse-engineering measures, or patching  SkiFree so that the snow monster doesn't eat you after 2000 meters.

snowmonster_resized_inclusion

Not today, abominable snowman. Not today.

Further Reading

I used this resource heavily to check and double-check the conversion between assembly and machine code and will continue to do so. Some of these code blocks might not have happened without it:

The Windows Message API was well-used in Win32 programs and remains supported in modern releases. There's more to be read, along with a programmatic resource (that was used in double-checking this post) and the existing Win32API documentation, at the following:

Techniques like what was presented here in Minesweeper brush shoulders with game-hacking techniques: the original Game Genie and GameShark worked by detecting memory addresses read from game cartridges and replacing what was read with values patched to alter game behavior. Bringing it to its logical extreme and assuming you have way too much time on your hands this can even be used to reverse-engineer anti-cheat mechanisms in online modern games, like you see here:

There's a long way between this post and that, and if you go that route I hope to read the work you publish along the way.

References

[1] https://www.ghidra-sre.org/

[2] https://www.github.com/nationalsecurityagency/ghidra.git

[3] https://github.com/openssl/openssl/blob/master/ssl/ssl_cert.c

[4] https://github.com/github/putty/blob/master/ssh1login.c

[5] https://gist.github.com/db44k/a651a85e7202953815020c83d3999ba6

[6] http://minesweepergame.com/download/windows-xp-minesweeper.php

[7] https://en.wikipedia.org/wiki/Hillary_Step#2015_alteration

[8] https://attack.mitre.org/techniques/T1546/012/

 

Special thanks to Adobe Firefly for generating this post's cover artwork.