Running code before main in Rust

Lynx March 21, 2025 #MalDev #Rust

MalDev (Series)
Using hooks to inject DLL with Zig
Running code before main in Rust
Windows API hooking with Rust on Windows ARM
Beyond Beacon: Writing BOF and a Native Rust COFF Loader

Today, I will explore the potential of executing user-defined code before the main function in Rust. This can be accomplished using Thread-Local Storage (TLS) callbacks or by leveraging C Runtime (CRT) behavior. I will walk through the implementation of TLS callbacks in Rust and delve into the details of the CRT, demonstrating how to use it to run custom code.

Thread Local Storage Callbacks

Going with the official Microsoft documentation, TLS callbacks can be defined as functions that support the construction and destruction of data objects for Thread-Local Storage. This might sound cryptic and doesn't explain much at first, but callbacks mainly allow you to manage the lifetime of objects that will be accessed at the thread level. You can think of them as a kind of constructors/destructors for such objects, which are called whenever a thread is spawned or terminated. This means that callbacks have the ability to execute code before execution reaches the main function. Malware developers often exploit this fact to implement various techniques, such as anti-debugging or even actual malicious code execution. The responsibility for calling these functions lies with the Windows Loader. Callback functions are stored in an array, which can be accessed through a pointer stored in the so-called TLS Directory, which is defined in the PE Optional Header Data Directories.

The TLS Directory is a simple struct that contains a field named AddressOfCallbacks. This field is a pointer to an array of defined TLS callbacks. Each element of this array is the address of one callback function. The Windows loader traverses this array and invokes these functions in a defined order.

As Microsoft documentation states, most typical programs will have only one callback function, if any. The screenshot below shows a default Rust Hello World program compiled in release mode. As you can see, there is one callback function defined at address 0x140009AE0, which was also shown in Ghidra. This function is the "default" callback function generated by the Rust compiler. To the best of my knowledge, it is always present in Rust binaries, at least when default settings are applied.

TLS Callback for Rust hello world program

After checking references to that callback function, I was led to a subsection named .CRT$XLB.

From there, this symbol was referenced by three functions:

std::rt::lang_start_internal,
std::sys::thread_local::destructors::list::register,
std::thread::current::init_current.

I believe the names of these functions are self-explanatory, and they clearly show their connection with threads. The lang_start_internal is a function that runs the actual main function (the function that will print "Hello world"), and it is launched after the CRT has done its job.

So, now that we know what callbacks are and where they are placed (in the TLS Directory and the .CRT$XL subsection), let's move on to the implementation details.

Implementing TLS callbacks

I'll show you two approaches for creating TLS callbacks in Rust. The first is inspired by methods commonly seen in C language implementations, and the second is based on solutions found in the rust-ctor crate. I'll start with the C-like implementation and explain details about the CRT subsections and theirs naming conventions.

C-like way

For the C-like approach, the callback function should have the following prototype, which is enforced by the PIMAGE_TLS_CALLBACK type:

typedef VOID (NTAPI *PIMAGE_TLS_CALLBACK) (PVOID DllHandle, DWORD Reason, PVOID Reserved);

In Rust, the callback function will look like this:

extern "system" fn custom_tls_callback_1(
    dll_handle: *mut c_void,
    reason: u32,
    reserved: *mut c_void,
) {
    unsafe {
        MessageBoxW(
            None,
            w!("Hello from callback"),
            w!("Info"),
            MESSAGEBOX_STYLE(1),
        );
    }
}

The important thing is to mark callback function as extern "system". By doing so, the compiler will generate this function in a way that is compatible with the Windows ABI. To register this function as a TLS callback, define the following static variable:

#[link_section = ".CRT$XLB"]
#[used]
static TLS_CALLBACK_1: PIMAGE_TLS_CALLBACK = Some(custom_tls_callback_1);

Let's begin with its type. PIMAGE_TLS_CALLBACK is encapsulated in an Option in the windows crate, which is why I've used Some to assign it to the TLS_CALLBACK_1 variable. Now, let's talk about the two procedural macros. #[used] ensures that the variable won't be optimized out, especially in release mode. #[link_section = ".CRT$XLB"] guarantees that the variable will be placed in the designated section of the PE file. In this case, I'm using the special CRT subsection, which is associated with the C Runtime, as the name suggests. Next, after the $ character comes the group name or subsection name (you can refer to it in both ways). The subsection name is generated dynamically by the compiler, but when writing a TLS callback, you must define it yourself. Choosing a name for the subsection can be problematic since Microsoft doesn't provide a list of names or a detailed explanation of the CRT's inner workings. However, based on research presented in the previous chapter, I stuck with the XL subsection. I would also advise avoiding subsections ending with A or Z (e.g., XLA), as they point to the beginning and end of the subsection, and may have special significance.

And basically, that's all. After running the program, you should first see a message box, and then, when you dismiss it, "Hello from main!" will be printed to the console.

As you can see, the callbacks were executed twice: once before and once after the program printed the "Hello from main" message. This occurs because callbacks are triggered both when the thread is created and when it is terminated.

Ctor way

Before I dive into reversing binaries, I'll show you the second approach. It differs in that I don't split the global variable and function definition. Instead, everything is stored in a single block:

#[link_section = ".CRT$XLC"]
#[used]
static TLS_CALLBACK_2: extern "C" fn(dll_handle: *mut c_void, reason: u32, reserved: *mut c_void) = {
    extern "C" fn custom_tls_callback_2(dll_handle: *mut c_void, reason: u32, reserved: *mut c_void) {
        unsafe {
            MessageBoxW(
                None,
                w!("Hello from yet another callback!"),
                w!("Info"),
                MESSAGEBOX_STYLE(1),
            );
        }
    }
    custom_tls_callback_2
};

At the end of the day, both methods produce the same results when it comes to callbacks. So, without further ado, let's move on to the next part.

Reversing

I've created a Hello World program that involves the use of two callbacks, which utilize the implementations described earlier. Analyzing the TLS Directory shows that there are three callbacks:

Here are two custom callbacks in Ghidra. As you can see, the addresses 0x140001040 and 0x140001064 match those presented on the previous screen.

Now, going to the .CRT$XL subsection reveals that the callbacks I defined are also present here:

Custom TLS callbacks in CRT XL subsection

There aren't any additional references to custom callbacks in the CRT subsection. However, when you look again at the previous screenshot, you'll see that both callbacks are referenced by Entry Point(*), which is an external reference, meaning the functions are called by other programs - most likely by the Windows Loader.

Additionally, as you may have observed earlier, these callbacks are invoked twice - once before and once after main. This reflects the initialization/destruction behavior of the callbacks. Furthermore, callbacks appear to be executed in alphabetical order based on the CRT subsection names, which Microsoft has confirmed, as I will demonstrate in the next section.

Earlier, I mentioned that I stuck with the .CRT$XL subsection. But what would happen if you chose a different subsection, like .CRT$AAA or something else? Well, if you're lucky, you might hit a "free" subsection, and your code will still work. However, your functions may disappear from the callback array (though they might still be invoked). You may wonder how this is even possible. Well, here's the next part to explain that.

CRT and pre-main code

When trying to search for information about CRT subsections, you may notice that rust-ctor uses .CRT$XCU. You might even come across this article, where it's stated that the Microsoft C++ compiler uses the XCU subsection for global initializers. Furthermore, the same article says the following:

The names .CRT$XCT and .CRT$XCV aren't used by either the compiler or the CRT library right now, but there's no guarantee that they'll remain unused in the future. And, your variables could still be optimized away by the compiler. Consider the potential engineering, maintenance, and portability issues before adopting this technique.

Aside from that, there is nothing more I could find about subsection names in the publicly available documentation. Based on the available documentation, we know that Microsoft uses XCU, and other names may or may not be reserved. Before you start "spraying and praying" when choosing a subsection name, let me present my own spraying results.

I observed some tendencies between subsection names and function behavior:

Functions defined in groups from XLB to XLZ were visible in the callbacks array and were executed twice. These are actual TLS callbacks.
Functions defined in groups from XCB to XCZ weren't visible in the callbacks array but were executed only once.
Functions defined in the XIZ group were executed but caused a STATUS_ACCESS_VIOLATION, causing the program to crash.

As you can see, picking random names isn't very reliable, and results may vary depending on your OS version or the toolchain used to build the executable. The problem is that we don’t know exactly what each subsection stores, so considering that Microsoft uses XCU and that the authors of the rust-ctor crate also use that subsection, we could simply stick with it, as well as XLZ, and accept those results. However, I wasn't satisfied and wanted more answers to my questions.

At this point, I compiled the program again, this time with two callbacks, but both defined in the XCU subsection. You can store multiple pointers in one subsection, as the following listing from Microsoft's documentation suggests:

.CRT$XCA
            __xc_a
.CRT$XCU
            Pointer to Global Initializer 1
            Pointer to Global Initializer 2
.CRT$XCZ
            __xc_z

Keep in mind that from now on, every screenshot I show will reference a program with "callbacks" (though, in this case, they aren't actually callbacks, but this will be explained later) stored in the .CRT$XCU subsection.

As I dug deeper into this topic, I came across this article, which led me to analyze the CRT source code. In this source code, there is a small list that explains what each particular subsection stores:

extern _CRTALLOC(".CRT$XIA") _PIFV __xi_a[];
extern _CRTALLOC(".CRT$XIZ") _PIFV __xi_z[];    /* C initializers */
extern _CRTALLOC(".CRT$XCA") _PVFV __xc_a[];
extern _CRTALLOC(".CRT$XCZ") _PVFV __xc_z[];    /* C++ initializers */
extern _CRTALLOC(".CRT$XPA") _PVFV __xp_a[];
extern _CRTALLOC(".CRT$XPZ") _PVFV __xp_z[];    /* C pre-terminators */
extern _CRTALLOC(".CRT$XTA") _PVFV __xt_a[];
extern _CRTALLOC(".CRT$XTZ") _PVFV __xt_z[];    /* C terminators */

As you can see, Microsoft didn’t lie in their documentation when they said that XCU stores global initializers placed there by the C++ compiler. However, this is based on an older source, and Microsoft now uses something called UCRT (Universal C Runtime). I found the UCRT source code on my disk, located at X:\Windows Kits\10\Source\10.0.22621.0\ucrt (for Visual Studio 2022 installation). When searching for .CRT occurrences, there wasn’t a general list like in the older version. Instead, I got more specific insights into what is stored in particular subsections.

I decided to search for the __xc_a identifier, which points to the beginning of the XC subsection, and this led me to the __initterm function.

Next, I opened Ghidra and found references to the initterm function, which brought me to the __scrt_common_main_seh where initterm is invoked:

Now, jumping to the __xc_a symbol, here they are - pointers to the defined callback functions.

Callback functions in the initializer list

Now, regarding the initterm function code, it is as follows:

// Calls each function in [first, last).  [first, last) must be a valid range of
// function pointers.  Each function is called, in order.
extern "C" void __cdecl _initterm(_PVFV* const first, _PVFV* const last)
{
    for (_PVFV* it = first; it != last; ++it)
    {
        if (*it == nullptr)
            continue;

        (**it)();
    }
}

It becomes obvious that initterm traverses the XCU subsection as an array and invokes each function stored in that array. Since initterm, as part of the CRT, is invoked before the main function, it provides the possibility to execute user-defined code before main.

As I mentioned earlier, when a function is stored in the XCU subsection, it’s not visible as a callback function in the TLS Directory:

This is because, technically, those functions aren’t callbacks, and they are stored in a different part of the file. If you want your function to be an actual callback, its address must be present in the array pointed to by the AddressOfCallbacks field, which is stored in the TLS Directory. To do so, you must define your callback inside the XL subsection, as explained in the TLS callback chapter.

When you define your functions in the .CRT$XCU subsection, they shouldn’t take any arguments because initterm doesn’t pass arguments to them.

#[link_section = ".CRT$XCU"]
#[used]
static INIT_FUNCTION_1: extern "C" fn() = init_function_1;

extern "C" fn init_function_1() {
    unsafe {
        MessageBoxW(
            None,
            w!("Hello from initterm"),
            w!("Info"),
            MESSAGEBOX_STYLE(1),
        );
    }
}

#[link_section = ".CRT$XCU"]
#[used]
static INIT_FUNCTION_2: extern "C" fn() = {
    extern "C" fn init_custom_function_2() {
        unsafe {
            MessageBoxW(
                None,
                w!("Hello from yet another initterm!"),
                w!("Info"),
                MESSAGEBOX_STYLE(1),
            );
        }
    }
    init_custom_function_2
};

Defining functions in different areas of the XC subsection, like XCB or XCW, will also involve the initterm function. However, depending on the last letter, custom functions may be called either before or after the initialization of C++ globals (XCU subsection).

Similarly to initterm, there is the initterm_e function, which operates on the XI subsection.

Summary

To summarize, in order to execute code before main, you can use either TLS Callbacks or the initterm function.

TLS Callbacks are defined within the .CRT$XL subsection and must be stored as pointers to the PIMAGE_TLS_CALLBACK type. These callbacks are called twice: once at the thread starts and once at the thread ends.
Functions executed by initterm are stored in the .CRT$XC subsection and are processed as pointers to functions that do not return any value and do not take any arguments. You can define multiple functions within one subsection.
Different CRT subsections are called in alphabetical order.