Beyond Beacon: Writing BOF and a Native Rust COFF Loader

Lynx August 10, 2025 #MalDev #Rust #BOF #COFF

Today, I'm excited to dive into Beacon Object Files (BOFs) and the COFF loaders that bring them to life. I'll peel back the layers of the COFF file specification, and walk you through a process of building a custom COFF loader from scratch, written entirely in pure Rust, specifically for Windows ARM. Much like my previous technical deep-dives, this article will thoroughly explore the low-level details and unique specifications for both Windows and ARM architectures. By the end of this guide, you'll not only understand what BOF files are and how they're executed but also how to implement both a BOF-like format and its corresponding loader in Rust.

Beacon Object Files

Beacon Object Files (BOFs) are binary files comprising snippets of compiled code that leverage the Windows API and loader API to implement custom logic defined by programmers. In their raw form, BOFs share similarities with object files; however, they're compiled in PIC mode, which means they don't rely on specific memory addresses and may be loaded anywhere in the memory. To be executed, BOFs require a special loader that resolves all dependencies and runs the file in memory. Moreover, the BOF loader provides additional API that expresses logic not represented by the Windows API.

To load a Beacon Object File (BOF), the loader must process the COFF format, which is typically used for object files. This involves performing various operations similar to those executed by the linker and Windows loader. Specifically, the COFF loader will load COFF sections into memory, perform relocations, resolve external symbols, and more. Ultimately, the loader will locate the entry point of the BOF and call it, triggering the execution of the actual BOF logic. Such loaders can run as standalone programs or be integrated into other applications like agents or beacons generated by exploitation frameworks.

The concept of Beacon Object Files (BOFs) was first introduced by the Cobalt Strike team and implemented within their framework. A BOF is designed to run in a beacon process – a process running on a compromised machine that acts as an agent, communicating with the attackers' server (also known as the C2 server). Cobalt Strike provides a loader and API that enables the creation of custom BOFs. While most operations performed by a BOF can be expressed using Windows API, some operations are implemented in the loader and exposed to the BOF file through an API – typically achieved via C/C++ header files. Several existing implementations, including those in Rust, have created BOF files and loaders that aim to be compatible with the Cobalt Strike framework, heavily relying on the C ABI.

In contrast, I chose a distinct approach, implementing both the loader and BOF independently of the Cobalt Strike framework. This allowed me to create a more "Rusty" implementation without relying on the C ABI to provide a common interface for the Cobalt Strike loader.

Before I dive into the nitty-gritty details of the implementation, let's take a step back and gain a higher-level understanding of how BOF files are structured.

Anatomy of BOF

Beacon Object Files (BOFs) are built around three fundamental pillars:

Following the original design, BOFs should not implement long-running tasks. Instead, they should be considered small modules that extend the functionality of the beacon on demand, akin to plugins for an agent process that communicates with a C2 server. Whenever additional functionality is needed, the BOF can be uploaded to the compromised machine and executed by the agent.

This approach also allows for a more staged approach to attacking, where not all capabilities are initially discovered. However, regarding long-running operations, I don't see any barriers in implementing more advanced operations in my implementation. In fact, I'll demonstrate how to implement a COFF loader, which opens up possibilities beyond the original design's constraints.

Now that the entry point is defined, it's time to provide access to various APIs. This may involve accessing both Windows API and loader API, or only the loader API if the Windows API is implemented or re-exported by the loader API.

The Windows API provides access to operations related to the operating system, such as writing to remote process memory. In contrast, the loader API offers more complex operations like injecting, logging, or communicating with a remote server. The actual design of the loader API depends on your own ideas, but following existing implementations, BOF logic should be built based on common operations implemented within the loader and Windows OS. Think of it like building a program from small, independent components that work together to achieve a specified goal. For example, you might create a BOF that takes a screenshot of the victim's screen and sends it to an attacker's server.

Finally, the implementation of a BOF API should be considered. A BOF file may take arguments that will be passed by the loader to the entry point, which the BOF can then use. This is especially important in languages like Rust, where there is no concept of header files known from C language. Similarly, the entry point function may return something more than void or (), notifying the loader about potential errors that may occur during BOF execution.

The following diagram provides a high-level overview of the BOF structure:

BOF overview

And now, let's dive into implementing a Beacon Object File in Rust!

BOF Implementation

Following the structure presented earlier, I'll start with the entry point. Based on my design assumptions, I've declared a classic Rust function without any extern declarations. This is possible because my entire stack will be implemented in Rust and there's no need to provide C ABI compatibility. The important thing to note is that the name of the function shouldn't be mangled, as the loader will be searching for a particular symbol's name. To achieve this, I decorated the entry point function with the no_mangle attribute.

#[no_mangle]
fn launch() {
}

That function will store all logic performed by BOF and it will call functions from loader and Windows API.

However, the default entry point for all Rust programs is a main function, and trying to compile the above code will cause an error that says main function is not defined. To resolve this, you have to add the #[no_main] attribute at the beginning of your .rs file.

#![no_main]

#[no_mangle]
fn launch() {
}

Yet this is not the end. Now, during compilation, you will see a message that ends like this:

= note: LINK : fatal error LNK1561: entry point must be defined

The linker doesn't know about your custom entry point, so you have to instruct it how to locate it. To do this, create a .cargo/config.toml file in your Rust project with the following content:

[build]
rustflags = [
    "-C", "link-arg=/ENTRY:launch",
    "-C", "link-arg=/SUBSYSTEM:CONSOLE"
]

As you can see, specifying the entry point wasn't enough to resolve all building issues. Now, when you try to build your sample BOF, you'll encounter a lot of linking errors like:

error LNK2001: unresolved external symbol memcpy
error LNK2001: unresolved external symbol memmove

There are several ways to resolve this error, such as specifying the linker to link against required C runtime libraries or implementing missing symbols yourself. However, I've decided to go with no_std to keep things simple and smooth.

Choosing no_std requires defining a custom panic handler, which is why I've also added it to my BOF code. Additionally, I've added the following options for both release and debug profiles in my Cargo.toml file:

panic = "abort"
opt-level = 0
lto = "off"
codegen-units = 1

The following listing presents the BOF code with all adjustments:

#![no_main]
#![no_std]

use core::panic::PanicInfo;

#[no_mangle]
fn launch() {
}

#[panic_handler]
#[no_mangle]
fn panic(_info: &PanicInfo) -> ! {
    loop {}
}

Now, running cargo b should complete without issues. However, it produces an executable (EXE) file which is already linked and ready to run. As the name "Beacon Object File" might suggest, it's based on object files. Therefore, it's necessary to emit an object file during compilation. In Rust, this can be achieved with the following command:

cargo rustc --release -- --emit=obj

The generated object file will be available in the target/release/deps directory. It can be inspected with hex editors or the dumpbin tool, which is provided with Microsoft Visual Studio.

Now that the base form of the BOF is ready and compiles, it's time to provide access to Windows and loader API. I'll begin with Windows API, as it's easier to implement.

Access to Windows API can be achieved by using crates like windows or declaring functions inside an extern block yourself. When using external crates, it's important to remember that my BOF template uses a no_std environment, so potential dependencies should also provide a no_std version. This time, I've opted for a simpler solution and decided to declare the required functions myself. Taking the MessageBoxA function as an example, I've written the following code:

use core::ffi::c_void;

type LPCSTR = *const u8;
type HWND = *mut c_void;

#[link(name = "User32")]
unsafe extern "system" {
    unsafe fn MessageBoxA(hWnd: HWND, lpText: LPCSTR, lpCation: LPCSTR, uType: u32) -> i32;
}

The extern block contains the MessageBoxA prototype and is decorated with the link attribute, which instructs the linker to find the listed functions in the specified library. Additionally, the block was defined as extern "system" to ensure compatibility with the Windows API, as stated in the Rust documentation.

Now, the function can be used in the launch function:

#[no_mangle]
fn launch() {
    unsafe {
        let _ = MessageBoxA(
            core::ptr::null_mut(),
            "Test test\0".as_ptr(),
            "Test window\0".as_ptr(),
            1,
        );
    };
}

When you check the generated object file with the dumpbin tool, it will reveal __imp_MessageBoxA symbols that indicate the usage of MessageBoxA.

> dumpbin /disasm .\target\release\deps\bof_payload.o
Microsoft (R) COFF/PE Dumper Version 14.42.34436.0
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file .\target\release\deps\bof_payload.o

File Type: COFF OBJECT

launch:
  0000000000000000: F81F0FFE  str         lr,[sp,#-0x10]!
  0000000000000004: 90000001  adrp        x1,__unnamed_1
  0000000000000008: 91000021  add         x1,x1,__unnamed_1
  000000000000000C: 90000002  adrp        x2,__unnamed_2
  0000000000000010: 91000042  add         x2,x2,__unnamed_2
  0000000000000014: 90000008  adrp        x8,__imp_MessageBoxA
  0000000000000018: F9400108  ldr         x8,[x8,__imp_MessageBoxA]
  000000000000001C: AA1F03E0  mov         x0,xzr
  0000000000000020: 52800023  mov         w3,#1
  0000000000000024: D63F0100  blr         x8
  0000000000000028: F84107FE  ldr         lr,[sp],#0x10
  000000000000002C: D65F03C0  ret

rust_begin_unwind:
  0000000000000000: 14000001  b           0000000000000004
  0000000000000004: 14000000  b           0000000000000004

When it comes to accessing the loader API, things take a different turn because both the BOF and loader are written in Rust, so there's no need for providing C ABI compatibility for the launch function. Therefore, access to the Loader API can be achieved in a very "Rusty" way.

In my mind, I have trait objects. The entry point function is a regular Rust function, so it's possible to pass various arguments to that function like a trait object that will provide access to various functions implemented in the loader. For now, let's assume there's an external no_std dependency that provides the BeaconApi type. BeaconApi is a trait, so it describes an interface that some object may implement. Therefore, passing a reference to an object implementing such an interface (called a trait object), it's possible to use those object functionalities inside the launch function.

The following listing shows the full code of my examplatory Beacon Object File:

#![no_main]
#![no_std]

use bof_api::BeaconApi;
use core::{ffi::c_void, panic::PanicInfo};

type LPCSTR = *const u8;
type HWND = *mut c_void;
const MB_OKCANCEL: u32 = 1;

#[link(name = "User32")]
unsafe extern "system" {
    unsafe fn MessageBoxA(hWnd: HWND, lpText: LPCSTR, lpCation: LPCSTR, uType: u32) -> i32;
}

#[no_mangle]
fn launch(api: &dyn BeaconApi) {
    unsafe {
        let _ = MessageBoxA(
            core::ptr::null_mut(),
            "Test test\0".as_ptr(),
            "Test window\0".as_ptr(),
            MB_OKCANCEL,
        );
    };

    api.beacon_output("Hello from BOF");
}

#[panic_handler]
#[no_mangle]
fn panic(_info: &PanicInfo) -> ! {
    loop {}
}

And here's the Cargo.toml file:

[package]
name = "bof-payload"
version = "0.1.0"
edition = "2021"

[dependencies]
bof-api = { path = "../bof-api" }

[profile.dev]
panic = "abort"
opt-level = 0
lto = "off"
codegen-units = 1

[profile.release]
panic = "abort"
opt-level = 0
lto = "off"
codegen-units = 1

Having implemented the draft of the Beacon Object File, it's time to dive into the COFF format and implement the loader, but before that, I'll show you how to define the BeaconApi trait, because it's pretty straightforward, and the loader will also use that trait as a dependency.

Loader API

The API of the Loader will provide access to additional functionalities that aren't covered by Windows API or are more complex yet pretty common, so having to write them each time in BOF may be a waste of time. What is more, access to Windows API may also be provided through Loader API and not by declaring required Windows functions inside BOF.

This approach has both advantages and disadvantages. The advantage is that the BOF file doesn't contain direct references to Windows functions, so it may be easier to evade detection by security products. For example, when the BOF file contains symbols like VirtualAlloc, WriteProcessMemory, or CreateRemoteThread, an AV solution may assume that this file will do something related to injections. Yet, if you replace those functions with references to Loader API, where names of the functions may be whatever you want, it will be harder to predict what the file is actually doing. Therefore, providing access to WinAPI through loader API may serve as an obfuscation technique. On the other hand, the disadvantage lies in the need to replicate functionality that already exists in the Windows API - even if the implementation is realized through simple function re-exporting.

Going into implementation details, the loader's API will be split between two places:

In practice, this means that the library will store a trait, and the loader will implement that trait for a given struct and pass such an object to the BOF's entry point.

To implement this, I've created a Rust library named bof-api and inside lib.rs file I've defined the BeaconApi trait.

#![no_main]
#![no_std]

pub trait BeaconApi {
    fn beacon_output(&self, data: &str);
}

My trait will only provide one function, which will implement printing to stdout. Notice that I've defined the bof-api library as no_std. This is because the BOF itself is no_std, so it may only use dependencies that also do not rely on the Rust standard library.

Now, inside the BOF project, I've added the following dependency in the Cargo.toml:

[dependencies]
bof-api = { path = "../bof-api" }

Such a dependency will also be added in the BOF Loader project, so the loader will be able to provide an implementation for the BeaconApi trait.

And this is it. Now it's time for the big boy – so for the BOF loader! But before writing the BOF loader, you have to understand the COFF format.

COFF Format

The COFF format, like the PE format which is based on it, describes the underlying object file that contains compiled code and various metadata generated by the compiler. By looking into Common Object File Format, you'll see that some headers and structs relate to both object and image files or only to one of them.

This is because an object file and an image file are two different things, despite being strongly related with each other. An object file is generated by the compiler and cannot be executed directly; first, it has to be processed by the linker which will resolve external dependencies and produce an image file that can be executed by the system loader.

Object files have a .o extension, and image files have a .exe or .dll extension in the case of Windows operating systems. As I said, both formats are similar in some ways. What I mean is that object files are blueprints from which image files are built. Therefore, both formats follow similar design principles and they have the following similarities: for example, both files are organized within sections that contain different data like code in the .text section and read-only data in the .rdata section.

From the perspective of building a COFF loader, the most important parts of the COFF format are:

The following chapters will present the most important aspects of COFF files from the perspective of writing a COFF loader. First, I'll discuss all the theory, and then I'll present the COFF format in more practical aspects.

Practical analysis will be performed using dumpbin tool (which is supplied with Visual Studio).

File and Section Headers

In the object file, there is only one file header: the COFF File Header. This header serves as the starting point of the COFF structure and contains general file statistics. From the perspective of building a COFF loader, the most important fields are:

After the COFF File Header, there is the Section Table, which consists of Section Headers. Each section header describes a separate section in the file. Sections contain actual data, such as code that will be executed. In some cases, there may be multiple headers describing sections with the same name, which is due to how modern compilers organize data.

The fields of interest in the Section Header are:

When parsing the COFF file, the process will look as follows:

  1. Read the COFF File Header, check the file architecture, and get the number of sections.
  2. Move the reading pointer by the size of the File Header to get to the beginning of the Section Table.
  3. Read the declared number of sections and for each section:
    1. Allocate memory with read-write permissions and copy section's content into that memory
    2. Apply all relocations for the current section
    3. Adjust memory permission for allocated sections (some sections are read-only, some have to be executable)

Symbol and String Tables

The Symbol Table as its name suggests, contains all symbols defined within the object file. This includes among other things internal and external functions, references to read-only data like string literals, and more. For example, this is where information about the BOF entry point may be found.

The fields of interest in the Symbol Table are:

The COFF String Table follows the COFF Symbol Table and contains the names of all symbols. If the name of a symbol is longer than 8 bytes, then the Name field in the Symbol's Table entry will contain an offset to the String Table. All strings in the String Table are null-terminated.

Relocation Table

The final table is the Relocation Table, which specifies how section data should be modified when loaded into memory. Each relocation entry contains only three fields:

Now that you have a theoretical overview of the COFF format, it's time to play with the generated BOF file and try to understand all those structures in practice.

Practical insight

When the compiler generates an object file, it doesn't resolve addresses to external symbols like Windows API functions or addresses to strings placed in the .rdata section. Instead, it produces information in the Relocation Table that such relocations have to be applied by the linker. So, it's the linker's responsibility to resolve the address of an external function or calculate the address of a global variable that will be used in a function invocation.

Let's take a look at the following code:

let _ = MessageBoxA(
    core::ptr::null_mut(),
    "Test test\0".as_ptr(),
    "Test window\0".as_ptr(),
    MB_OKCANCEL,
);

There's a call to an external function MessageBoxA that will be resolved by the linker, but also two text parameters Test test and Test window will be placed in the .rdata section. The linker will have to calculate the addresses of these two strings and feed those addresses to the appropriate registers. Therefore, such a function invocation should introduce 3 relocations.

Now let's look at this disassembly dump of the launch function:

launch:
  0000000000000000: D10083FF  sub         sp,sp,#0x20
  0000000000000004: F9000BFE  str         lr,[sp,#0x10]
  0000000000000008: F90003E1  str         x1,[sp]
  000000000000000C: F90007E0  str         x0,[sp,#8]
  0000000000000010: 90000001  adrp        x1,__unnamed_1
  0000000000000014: 91000021  add         x1,x1,__unnamed_1
  0000000000000018: 90000002  adrp        x2,__unnamed_2
  000000000000001C: 91000042  add         x2,x2,__unnamed_2
  0000000000000020: 90000008  adrp        x8,__imp_MessageBoxA
  0000000000000024: F9400108  ldr         x8,[x8,__imp_MessageBoxA]
  0000000000000028: AA1F03E0  mov         x0,xzr
  000000000000002C: 52800023  mov         w3,#1
  0000000000000030: D63F0100  blr         x8
  0000000000000034: F94003E1  ldr         x1,[sp]
  0000000000000038: F94007E0  ldr         x0,[sp,#8]
  000000000000003C: F9400C28  ldr         x8,[x1,#0x18]
  0000000000000040: 90000001  adrp        x1,__unnamed_3
  0000000000000044: 91000021  add         x1,x1,__unnamed_3
  0000000000000048: 528001C9  mov         w9,#0xE
  000000000000004C: 2A0903E2  mov         w2,w9
  0000000000000050: D63F0100  blr         x8
  0000000000000054: F9400BFE  ldr         lr,[sp,#0x10]
  0000000000000058: 910083FF  add         sp,sp,#0x20
  000000000000005C: D65F03C0  ret

Following the calling convention on Windows ARM, there are the following:

Speaking about those symbols, they can be examined with the following dumpbin command:

PS D:\malware-decoded\rust\BOFs\bof-payload\target\release\deps> dumpbin.exe /symbols .\bof_payload.o
Microsoft (R) COFF/PE Dumper Version 14.42.34436.0
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file .\bof_payload.o

File Type: COFF OBJECT

COFF SYMBOL TABLE
000 00000000 SECT1  notype       Static       | .text
002 00000000 SECT2  notype       Static       | .data
004 00000000 SECT3  notype       Static       | .bss
006 00000000 SECT4  notype       Static       | .text
008 00000000 SECT4  notype ()    External     | launch
009 00000000 SECT9  notype       Static       | .xdata
00B 00000000 SECT5  notype       Static       | .text
00D 00000000 SECT5  notype ()    External     | rust_begin_unwind
00E 00000000 SECT6  notype       Static       | .rdata
010 00000000 SECT6  notype       Static       | __unnamed_1
011 00000000 SECT7  notype       Static       | .rdata
013 00000000 SECT7  notype       Static       | __unnamed_2
014 00000000 SECT8  notype       Static       | .rdata
016 00000000 SECT8  notype       Static       | __unnamed_3
017 00000000 SECTA  notype       Static       | .pdata
019 00000000 ABS    notype       Static       | @feat.00
01A 00000000 UNDEF  notype       External     | __imp_MessageBoxA
01B 00000000 DEBUG  notype       Filename     | .file

You can see that all __unnamed_* symbols were placed in the .rdata section. This is because those symbols refer to string literals and they are read-only, so by default, the compiler places them in a section that contains read-only data.

The __imp_MessageBoxA symbol was marked as External, which is also very important information that informs the linker that this symbol should be searched for in external DLLs.

To ensure that the __unnamed_* symbols refer to strings, you can examine the .rdata section with the following command:

PS D:\malware-decoded\rust\BOFs\bof-payload\target\release\deps> dumpbin /section:.rdata /rawdata .\bof_payload.o
Microsoft (R) COFF/PE Dumper Version 14.42.34436.0
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file .\bof_payload.o

File Type: COFF OBJECT

SECTION HEADER #6
  .rdata name
       0 physical address
       0 virtual address
       A size of raw data
     264 file pointer to raw data (00000264 to 0000026D)
       0 file pointer to relocation table
       0 file pointer to line numbers
       0 number of relocations
       0 number of line numbers
40101040 flags
         Initialized Data
         COMDAT; sym= __unnamed_1
         1 byte align
         Read Only

RAW DATA #6
  00000000: 54 65 73 74 20 74 65 73 74 00                    Test test.

SECTION HEADER #7
  .rdata name
       0 physical address
       0 virtual address
       C size of raw data
     26E file pointer to raw data (0000026E to 00000279)
       0 file pointer to relocation table
       0 file pointer to line numbers
       0 number of relocations
       0 number of line numbers
40101040 flags
         Initialized Data
         COMDAT; sym= __unnamed_2
         1 byte align
         Read Only

RAW DATA #7
  00000000: 54 65 73 74 20 77 69 6E 64 6F 77 00              Test window.

SECTION HEADER #8
  .rdata name
       0 physical address
       0 virtual address
       E size of raw data
     27A file pointer to raw data (0000027A to 00000287)
       0 file pointer to relocation table
       0 file pointer to line numbers
       0 number of relocations
       0 number of line numbers
40101040 flags
         Initialized Data
         COMDAT; sym= __unnamed_3
         1 byte align
         Read Only

RAW DATA #8
  00000000: 48 65 6C 6C 6F 20 66 72 6F 6D 20 42 4F 46        Hello from BOF

As you can see, the strings are present in the .rdata section, and even dumpbin tool displayed what symbol refers to particular string.

Now, going back to relocations. What we know is that in an object file, instructions referencing symbols actually don't point to actual addresses of data, yet they store some placeholder value. Again, take a look at this piece of assembly code:

0000000000000010: 90000001  adrp        x1,__unnamed_1
0000000000000014: 91000021  add         x1,x1,__unnamed_1

The base opcode of the ADRP instruction is 0x90000000. In the above code, we got 0x90000001, so within the instruction, only the x1 register was encoded. Based on my previous post, the ADRP instruction calculates the final address based on its own address and a shift value that's encoded within it. However, here the file isn't loaded in memory yet, and the ADRP position is expressed as a relative offset to the beginning of the .text section, so calculations don't make sense.

Therefore, when the loader places the above code in memory, it will traverse a special structure called the Relocation Table. This table can be viewed with the following command:

PS D:\malware-decoded\rust\BOFs\bof-payload\target\release\deps> dumpbin.exe /relocations .\bof_payload.o
Microsoft (R) COFF/PE Dumper Version 14.42.34436.0
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file .\bof_payload.o

File Type: COFF OBJECT

RELOCATIONS #4
                                                Symbol    Symbol
 Offset    Type              Applied To         Index     Name
 --------  ----------------  -----------------  --------  ------
 00000010  PAGEBASE_REL21             90000001        10  __unnamed_1
 00000014  PAGEOFFSET_12A             91000021        10  __unnamed_1
 00000018  PAGEBASE_REL21             90000002        13  __unnamed_2
 0000001C  PAGEOFFSET_12A             91000042        13  __unnamed_2
 00000020  PAGEBASE_REL21             90000008        1A  __imp_MessageBoxA
 00000024  PAGEOFFSET_12L             F9400108        1A  __imp_MessageBoxA
 00000040  PAGEBASE_REL21             90000001        16  __unnamed_3
 00000044  PAGEOFFSET_12A             91000021        16  __unnamed_3

RELOCATIONS #A
                                                Symbol    Symbol
 Offset    Type              Applied To         Index     Name
 --------  ----------------  -----------------  --------  ------
 00000000  ADDR32NB                   00000000         6  .text
 00000004  ADDR32NB                   00000000         9  .xdata

The Relocation Table contains information about all relocations that have to be performed. Take a look at the first relocation:

RELOCATIONS #4
                                                Symbol    Symbol
 Offset    Type              Applied To         Index     Name
 --------  ----------------  -----------------  --------  ------
 00000010  PAGEBASE_REL21             90000001        10  __unnamed_1

This relocation refers to section #4, which is a .text section.

SECTION HEADER #4
   .text name
       0 physical address
       0 virtual address
      60 size of raw data
     1A4 file pointer to raw data (000001A4 to 00000203)
     204 file pointer to relocation table
       0 file pointer to line numbers
       8 number of relocations
       0 number of line numbers
60301020 flags
         Code
         COMDAT; sym= launch
         4 byte align
         Execute Read

The relocation is applied to the instruction at offset 0x10, specifically the ADRP instruction in the launch function:

launch:
  0000000000000000: D10083FF  sub         sp,sp,#0x20
  0000000000000004: F9000BFE  str         lr,[sp,#0x10]
  0000000000000008: F90003E1  str         x1,[sp]
  000000000000000C: F90007E0  str         x0,[sp,#8]
  0000000000000010: 90000001  adrp        x1,__unnamed_1

As you can see, the output from dumbbin tool contains an ADRP instruction opcode. There is also a relocation type which is described here as "The page base of the target, for ADRP instruction".

Finally, there is information about which symbol this relocation refers to. In this case, it's an __unnamed_1 symbol with index 10, and if you take a look at the symbol table, you'll see that the __unnamed_1 symbol is actually placed at index 10.

COFF SYMBOL TABLE
010 00000000 SECT6  notype       Static       | __unnamed_1
013 00000000 SECT7  notype       Static       | __unnamed_2
016 00000000 SECT8  notype       Static       | __unnamed_3

The symbol __unnamed_1 is stored in the section number 6, which corresponds to the .rdata section:

SECTION HEADER #6
  .rdata name
       0 physical address
       0 virtual address
       A size of raw data
     264 file pointer to raw data (00000264 to 0000026D)
       0 file pointer to relocation table
       0 file pointer to line numbers
       0 number of relocations
       0 number of line numbers
40101040 flags
         Initialized Data
         COMDAT; sym= __unnamed_1
         1 byte align
         Read Only

RAW DATA #6
  00000000: 54 65 73 74 20 74 65 73 74 00                    Test test.

To summarize, code in an object file when referring to symbols contains placeholder values that will be overwritten by the linker. The linker traverses the Relocation Table, which stores information about where in the code relocation is needed and to which symbol it should be applied.

When processing each relocation entry, the linker resolves the address of the symbol and patches the instruction so it will point to the address where the required symbol is stored, for example in the .rdata section that was loaded into memory.

The following figure illustrates the process of performing relocations:

Processing relocations

BOF Loader

As stated earlier in this blog post, Beacon Object Files (BOFs) cannot be executed directly and require a loader to run. To execute a BOF, a loader must substitute the linker and system loader. The loader will perform several steps to achieve this:

The most complex part of this process is parsing relocations. However, with proper crates and Microsoft documentation, it's not a daunting task to implement a relocation parser.

The bof-loader project has been created with the following Cargo.toml file:

[package]
name = "bof-loader"
version = "0.1.0"
edition = "2021"

[dependencies]
bof-api = { path = "../bof-api" }
object = "0.32.1"
anyhow = "1.0.98"
windows = { version = "0.61.1", features = ["Win32", "Win32_System", "Win32_System_Diagnostics", "Win32_System_Diagnostics_Debug", "Win32_System_Diagnostics_ToolHelp", "Win32_System_LibraryLoader", "Win32_System_Memory", "Win32_System_ProcessStatus", "Win32_System_Threading"] }
colog = "1.3.0"
log = "0.4.27"

To begin, the bof-loader project relies on three crucial crates: bof-api, which provides access to the loader interface; object, which offers logic and structures for COFF parsing; and windows, which provides access to the Windows API. With these tools in place, the first step is to load the BOF file into memory.

Loading BOF into memory

Loading a BOF into memory is a straightforward process that consists of two steps: first, reading the content of the object file into a buffer; and second, traversing section headers to copy each section's content into memory.

In my implementation, I have defined a CoffLoader structure that implements two functions with self-descriptive names.

pub struct CoffLoader { }

impl CoffLoader {
    pub fn load_coff_file(&mut self, path: &Path) -> Result<Vec<u8>> {
        let binary_data = fs::read(path)?;
        Ok(binary_data)
    }

    pub fn allocate_sections<'a>(
        &self,
        coff_binary: &'a [u8],
    ) -> Result<Vec<AllocatedSection<'a>>> {
        let coff = ParsedCoff::new(coff_binary)?;
        let mut sections = vec![];

        for section in coff.section_table.iter() {
            if section.size_of_raw_data.get(Default::default()) != 0 {
                let allocated_section =
                    AllocatedSection::allocate(section, coff.string_table, coff_binary)?;
                sections.push(allocated_section);
            }
        }

        Ok(sections)
    }
}

In the allocate_sections function, a ParsedCoff structure is created which acts as a simple wrapper around types from the object crate.

struct ParsedCoff<'a> {
    pub coff_header: &'a ImageFileHeader,
    pub string_table: StringTable<'a>,
    pub symbol_table: SymbolTable<'a>,
    pub section_table: SectionTable<'a>,
}

impl<'a> ParsedCoff<'a> {
    fn new(binary_data: &'a [u8]) -> Result<Self> {
        let mut coff_file_header_offset = 0;

        let coff_header = ImageFileHeader::parse(binary_data, &mut coff_file_header_offset)?;

        let string_table = coff_header.symbols(binary_data).unwrap().strings();

        let symbol_table: object::coff::SymbolTable<'_> = coff_header.symbols(binary_data)?;

        let section_table = coff_header.sections(binary_data, coff_file_header_offset)?;

        Ok(Self {
            coff_header,
            string_table,
            symbol_table,
            section_table,
        })
    }
}

The ParsedCoff structure serves as a convenient wrapper for important types from the object crate, making it easier to access these types. Since all these structs rely on borrowed data, creating this structure does not involve additional copies, which helps maintain efficiency.

After initial COFF file parsing, it's time to iterate through all sections. To optimize memory usage, I've opted to allocate memory only for non-zero-sized sections. The AllocatedSection structure, implemented as a RAII-compliant type, handles this allocation in the following way:

pub struct AllocatedSection<'a> {
    pub name: String,
    pub base_address: *mut c_void,
    pub header: &'a ImageSectionHeader,
}

impl<'a> Drop for AllocatedSection<'a> {
    fn drop(&mut self) {
        unsafe {
            VirtualFree(self.base_address, 0, MEM_RELEASE)
                .expect("Failed to deallocate section in the loader's address space");
        };
    }
}

impl<'a> AllocatedSection<'a> {
    pub fn allocate(
        section: &'a ImageSectionHeader,
        strings: StringTable<'a>,
        file_raw_data: &'a [u8],
    ) -> Result<Self> {
        let raw_name = section.name(strings).unwrap();
        let section_name = String::from_utf8_lossy(raw_name);
        let section_size = section.size_of_raw_data.get(LittleEndian::default());

        debug!(
            "Section name {} | Section size: {}",
            section_name, section_size
        );

        if section_size == 0 {
            return Err(anyhow!("Cannot allocate section with size 0"));
        }

        let section_base_address = unsafe {
            VirtualAlloc(
                None,
                section_size as usize,
                MEM_COMMIT | MEM_RESERVE,
                PAGE_EXECUTE_READWRITE,
            )
        };

        if section_base_address.is_null() {
            return Err(anyhow!("Failed to allocate memory for section"));
        }

        debug!(
            "Allocated memory for {} at {:p}",
            section_name, section_base_address
        );

        let section_data = section.coff_data(file_raw_data).unwrap();

        unsafe {
            std::ptr::copy(
                section_data.as_ptr(),
                section_base_address as *mut u8,
                section_size as usize,
            )
        };

        let allocated_section = AllocatedSection {
            base_address: section_base_address,
            name: section_name.to_string(),
            header: section,
        };

        Ok(allocated_section)
    }
}

The AllocatedSection structure uses VirtualAlloc and VirtualFree for memory management under the hood. All sections are allocated with full permissions (read, write and execute) to accommodate processing relocations, which requires writing to memory. Later, permissions may be adjusted based on the Characteristics field in the Section Header.

The allocation process involves locating section data in the file and copying it using the std::ptr::copy function. Next, an AllocatedSection structure is created, storing:

All allocated sections are stored in a Vec, which is returned by the allocate_sections function. This allows the program to utilize these sections later on.

At this point, the main function takes shape as follows:

let file_to_load = &PathBuf::from("target/bof_payload.o");
let mut coff_loader = CoffLoader;

info!("Loading file '{}'", file_to_load.display());
let binary = coff_loader.load_coff_file(&file_to_load)?;
info!("Object file '{}' was loaded successfully", file_to_load.display());

info!("Alocating COFF sections");
let allocated_sections = coff_loader.allocate_sections(&binary)?;
info!("Successfully allocated all non-zero sized sections");

With all sections loaded into memory, it's now time to tackle relocations!

Processing relocations

Relocations are processed exclusively for allocated sections. The AllocatedSection structure contains a reference to an ImageSectionHeader, allowing direct access to relocations for a given section.

Before processing relocations, my implementation first calculates the delta between the actual section number in the COFF file and the sections that were allocated in memory. This is necessary because the loader only allocates sections with non-zero SizeOfRawData. As a result, this introduces a shift when referencing sections via the symbol table. For example, if a COFF file has 5 sections but only 3 are allocated, then if the symbol table contains a reference to section number 4, it will actually refer to the section at index 1 in the vector of AllocatedSection.

COFF Sections In Object File        COFF Sections In Memory
Number  Name    Size                Number  Name    Size    Position in vector
1       .text   0                   2       .data   8       0
2       .data   8                   4       .text   32      1
3       .bss    0                   5       .rdata  16      2
4       .text   32                  
5       .rdata  16                  

The beginning of the handle_relocations function begins with the process of delta calculation. Additionally, a Patcher struct is created to be responsible for modifying instructions that are influenced by relocations.

pub fn handle_relocations(
    &mut self,
    sections: &[AllocatedSection],
    coff_binary: &[u8],
) -> Result<()> {
    let coff = ParsedCoff::new(coff_binary)?;

    // Later I'll store Patcher inside CoffLodaer
    let mut patcher = Patcher::new();

    // Some sections have size 0 and they aren't allocated. Therefore tracking that difference between allocated sections
    // and actual sections in COFF will allow to adjust Section Number that tells in which section given symbol is stored
    let section_delta = coff.section_table.len() - sections.len();
    
    //...
}

The delta calculation is relatively naive and does not account for more complex scenarios, such as when the very first sections are shifted. However, it works sufficiently well for my simple BOF example. When the delta is calculated, it's time to enumerate relocations. For each section, the function checks if there are any relocations. If so, the function processes them:

pub fn handle_relocations(
    &mut self,
    sections: &[AllocatedSection],
    coff_binary: &[u8],
) -> Result<()> {
    // ...

    for section in sections {
        let relocations = section.header.coff_relocations(coff_binary)?;

        if relocations.len() > 0 {
            debug!("Relocations for {}", section.name);

            for relocation in relocations {

            }
        }
    }

    Ok(())
}

Each relocation is processed according to the following steps:

  1. The program retrieves the address of the symbol referenced by the relocation.
    • If the symbol is external, the program resolves that symbol and returns its address.
    • If the symbol is static or internal, the program locates it in a proper section that is loaded in memory.
  2. The program obtains the address of the instruction where the relocation needs to be applied.
  3. The program patches the instruction so it references the proper address.

Without delving into the specifics of obtaining the symbol address, the procedure follows this outline:

for relocation in relocations {
    let relocation = CoffRelocation::new(relocation);
    let symbol = CoffSymbol::new(relocation.symbol_table_index as usize, &coff)?;

    debug!(
        "Processing relocation for '{}' symbol",
        symbol.name,
    );

    trace!(
        "Name: {} | Address: {:#x} | Symbol Table Index: {:#x} | Type: {:?}",
        symbol.name,
        relocation.virtual_address,
        relocation.symbol_table_index,
        relocation.kind,
    );

    let symbol_ptr = if symbol.is_symbol_section_number_special() {
        // Get address of the external symbol
    } else {
        // Get address of the internal symbol
    };

    let addres_to_read =
        (section.base_address as usize) + relocation.virtual_address as usize;
    let instruction_to_patch_ptr = addres_to_read as *const u8;

    patcher.patch_instruction(
        instruction_to_patch_ptr.cast(),
        symbol_ptr,
        relocation.kind,
    )?;
}

Here, additional wrapping structures CoffRelocation and CoffSymbol were introduced.

External symbols are resolved by delegating all work to the Resolver structure, which will be discussed later.

if !symbol.is_external_symbol() {
    panic!("Program suppots only external symbols!");
}

let symbol_name = symbol.get_name_without_prefix();

debug!("Resolving address for external symbol {}", symbol_name);
self.resolver.resolve_symbol(symbol_name)?

Internal symbols' addresses are resolved by locating the base addresses of the sections in which these symbols are stored. Here, each symbol resides in a different section - different in the sense that each symbol is placed in a section with a separate Section Header, even if multiple symbols belong to sections with the same name (e.g. .rdata). As a result, the loader allocates a separate memory region (aka AllocatedSection) for each symbol, and the address of that region becomes the symbol’s address. In this implementation, the base address stored in the AllocatedSection is effectively the address of the symbol itself.

trace!(
    "Symbol section {:?} | Symbol value: {} | Symbol storage class: {}",
    symbol.get_symbol_section_name(&coff),
    symbol.value,
    symbol.storage_class
);

let section_difference = symbol.section_number - section_delta - 1;

let allocated_symbol_section = sections
    .get(section_difference)
    .expect("Cannot find allocated section");

allocated_symbol_section.base_address

The complete handle_relocations function appears as follows:

impl CoffLoader {
    // previous functions ...   

    pub fn handle_relocations(
        &mut self,
        sections: &[AllocatedSection],
        coff_binary: &[u8],
    ) -> Result<()> {
        let coff = ParsedCoff::new(coff_binary)?;
        let mut patcher = Patcher::new();

        // Some sections have size 0 and they aren't allocated. Therefore tracking that difference between allocated sections
        // and actual sections in COFF will allow to adjust Section Number that tells in which section given symbol is stored
        let section_delta = coff.section_table.len() - sections.len();

        for section in sections {
            let relocations = section.header.coff_relocations(coff_binary)?;

            if relocations.len() > 0 {
                debug!("Relocations for {}", section.name);

                for relocation in relocations {
                    let relocation = CoffRelocation::new(relocation);
                    let symbol = CoffSymbol::new(relocation.symbol_table_index as usize, &coff)?;

                    debug!(
                        "Processing relocation for '{}' symbol",
                        symbol.name,
                    );

                    trace!(
                        "Name: {} | Address: {:#x} | Symbol Table Index: {:#x} | Type: {:?}",
                        symbol.name,
                        relocation.virtual_address,
                        relocation.symbol_table_index,
                        relocation.kind,
                    );

                    let symbol_ptr = if symbol.is_symbol_section_number_special() {
                        if !symbol.is_external_symbol() {
                            panic!("Program suppots only external symbols!");
                        }

                        let symbol_name = symbol.get_name_without_prefix();

                        debug!("Resolving address for external symbol {}", symbol_name);
                        self.resolver.resolve_symbol(symbol_name)?
                    } else {
                        trace!(
                            "Symbol section {:?} | Symbol value: {} | Symbol storage class: {}",
                            symbol.get_symbol_section_name(&coff),
                            symbol.value,
                            symbol.storage_class
                        );

                        let section_difference = symbol.section_number - section_delta - 1;

                        let allocated_symbol_section = sections
                            .get(section_difference)
                            .expect("Cannot find allocated section");

                        allocated_symbol_section.base_address
                    };

                    print_memory_at_address(
                        symbol_ptr.cast::<u8>(),
                        8,
                        Some("Symbol is avalible at address"),
                    );

                    let addres_to_read =
                        (section.base_address as usize) + relocation.virtual_address as usize;
                    let instruction_to_patch_ptr = addres_to_read as *const u8;

                    print_memory_at_address(
                        instruction_to_patch_ptr,
                        4,
                        Some("Relocation to patch address"),
                    );

                    patcher.patch_instruction(
                        instruction_to_patch_ptr.cast(),
                        symbol_ptr,
                        relocation.kind,
                    )?;
                }
            }
        }

        Ok(())
    }
}

Now that the relocation process has been covered, let's take a closer look at two essential components that haven't been discussed yet: the Symbol Resolver and the Instruction Patcher. These crucial parts will play a vital role in resolving external symbols and patching instructions accordingly.

Resolving external symbols

External symbols are resolved by the Resolver structure, which contains:

The Resolver implements a concept known as preloading, which loads specified DLLs at program start, ensuring that they will be available when particular symbols are searched. This approach simplifies implementation because there is no need for runtime binding of symbols to the DLL from which those symbols originate.

impl Resolver {
    pub fn new() -> Self {
        Self {
            symbol_resolver: SymbolResolver::new(),
            module_resolver: ModuleResolver::new(),
        }
    }

    pub fn new_with_preloaded_modules(modules: &[&str]) -> Result<Self, Error> {
        debug!("Loader invoked with preloaeded modules. Checking required modules");
        let mut module_resolver = ModuleResolver::new();

        for &module in modules {
            let _ = module_resolver.resolve_module(module)?;
        }

        Ok(Self {
            symbol_resolver: SymbolResolver::new(),
            module_resolver,
        })
    }

    pub fn resolve_symbol(&mut self, name: &str) -> Result<*const c_void> {
        let modules = self.module_resolver.get_modules();
        self.symbol_resolver.resolve_symbol(name, modules)
    }
}

Module Resolver

The ModuleResolver, under the hood, relies on EnumProcessModules and LoadLibraryA as the core of its operations, along with internal vector enumeration. The basic idea is simple: it checks which DLLs are loaded into process memory using EnumProcessModules, and adds information about those DLLs to an internal vector that serves as a cache. When a new DLL is loaded, the module resolver refreshes its internal cache to preserve information about the newly loaded DLL.

In some places, code was generated by the LLMs, which may appear unusual compared to my usual coding style.

pub struct Module {
    pub name: String,
    pub handle: HMODULE,
}
pub struct ModuleResolver {
    modules: Vec<Module>,
}

impl ModuleResolver {
    pub fn new() -> Self {
        ModuleResolver {
            modules: Vec::new(),
        }
    }

    pub fn enumerate_modules(&mut self) -> Result<(), Error> {
        self.modules.clear();

        let h_process = unsafe { GetCurrentProcess() };
        let mut cb_needed: u32 = 0;

        // First call to get the required buffer size
        unsafe { EnumProcessModules(h_process, ptr::null_mut(), 0, &mut cb_needed) }?;

        if cb_needed == 0 {
            return Err(anyhow!("Failed to get module count"));
        }

        let module_count = cb_needed / std::mem::size_of::<HMODULE>() as u32;
        let mut h_modules: Vec<HMODULE> = vec![HMODULE::default(); module_count as usize];

        // Second call to get the actual modules
        let result = unsafe {
            EnumProcessModules(h_process, h_modules.as_mut_ptr(), cb_needed, &mut cb_needed)
        };

        if result.is_err() {
            return Err(anyhow!(
                "EnumProcessModules failed with error: {:?}",
                unsafe { GetLastError() }
            ));
        }

        // Get module file names
        for &h_module in &h_modules {
            if h_module.is_invalid() {
                continue;
            }

            let mut module_name = vec![0u8; MAX_PATH as usize];
            let name_len = unsafe { GetModuleFileNameA(Some(h_module), &mut module_name) };

            if name_len > 0 {
                // Convert to Rust string
                if let Some(null_pos) = module_name.iter().position(|&x| x == 0) {
                    module_name.truncate(null_pos);
                }

                if let Ok(name_str) = String::from_utf8(module_name) {
                    self.modules.push(Module {
                        name: name_str,
                        handle: h_module,
                    });
                }
            }
        }

        Ok(())
    }

    /// Check if a specific DLL is loaded (case-insensitive filename check)
    fn get_loaded_module(&self, dll_name: &str) -> Option<&Module> {
        let dll_name_lower = dll_name.to_lowercase();

        for module in &self.modules {
            // Extract filename from full path
            if let Some(filename) = module.name.split('\\').last() {
                if filename.to_lowercase() == dll_name_lower {
                    return Some(&module);
                }
            }
        }

        None
    }

    pub fn load_dll(&self, dll_path: &str) -> Result<HMODULE, Error> {
        debug!("Loading '{}'", dll_path);
        let dll_path_cstring = CString::new(dll_path)
            .map_err(|_| anyhow!("Invalid DLL path (contains null bytes)".to_string()))?;

        unsafe {
            let h_module = LoadLibraryA(PCSTR(dll_path_cstring.as_ptr().cast()))?;

            if h_module.is_invalid() {
                let error_code = GetLastError();
                return Err(anyhow!("LoadLibraryA failed with error: {:?}", error_code));
            }

            debug!("Successfully loaded module at {:#X}", h_module.0 as isize);

            Ok(h_module)
        }
    }

    pub fn resolve_module(&mut self, dll_name: &str) -> Result<HMODULE, Error> {
        self.enumerate_modules()?;

        if let Some(module) = self.get_loaded_module(dll_name) {
            debug!("DLL '{}' is already loaded at: {:#X}", dll_name, module.handle.0 as isize);
            return Ok(module.handle);
        }

        debug!("DLL '{}' not found", dll_name);
        let handle = self.load_dll(dll_name)?;

        self.enumerate_modules()?;

        Ok(handle)
    }

    pub fn get_modules(&self) -> &[Module] {
        &self.modules
    }
}

Symbol Resolver

The Symbol Resolver implementation is relatively simple. It uses GetProcAddress to obtain the address of a symbol from a loaded DLL. The resolver tries to find defined symbols in all loaded DLLs, although this may not be an optimal solution. However, it is at least a straightforward approach.

In addition, the Symbol Resolver caches already resolved symbols, so there is no need for multiple searches if a symbol was resolved in the past.

pub struct SymbolResolver {
    symbols: HashMap<String, *const c_void>,
}

impl SymbolResolver {
    pub fn new() -> Self {
        Self {
            symbols: HashMap::new(),
        }
    }

    pub fn resolve_symbol(&mut self, name: &str, modules: &[Module]) -> Result<*const c_void> {
        if let Some(proc) = self.symbols.get(name) {
            debug!("Found already resolved symbol '{}'", name);
            return Ok(*proc);
        }

        debug!("Symbol '{}' isn't already resolved", name);

        let c_name = CString::new(name)?;

        for module in modules {
            trace!(
                "Searching for symbol in '{}' loaded at {:#X}",
                module.name, module.handle.0 as isize
            );

            let proc_address =
                unsafe { GetProcAddress(module.handle, PCSTR(c_name.as_bytes().as_ptr())) };

            if let Some(address) = proc_address {
                self.symbols.insert(name.into(), address as *const c_void);
                return Ok(address as *const c_void);
            }
        }

        Err(anyhow!(
            "Failed to resolve address for {} symbol with error {:?}",
            c_name.to_string_lossy(),
            unsafe { GetLastError() }
        ))
    }
}

Patching instructions

The Instruction Patcher is responsible for modifying instructions influenced by relocations. The main function of the patcher is patch_instruction, which takes three arguments:

The instruction patching process involves encoding the symbol address within the underlying instruction. This is done by deconstructing the assembly instruction, modifying its bytes, and writing it back into memory.

The process is similar to the one described previously, with the difference that the patched instruction cannot be replaced by other instructions that would have a similar effect, as described here.

In my implementation, I've decided to cover only 4 types of relocation, with 3 of them requiring instruction patching. These cases include ADRP patching, ADD patching, and LDR patching. However, only ADRP patching requires special handling due to distance limits enforced by the ADRP instruction.

Therefore, the function that patches ADRP tries to allocate a new memory page that will be within the range of the ADRP instruction. This is done by the allocate_thunk_page_near function, which searches for a free memory page around the page where the ADRP instruction is located. The search for a free memory region is performed in both directions ("above" and "under") the page in which ADRP lies.

fn allocate_thunk_page_near(target: usize) -> *const c_void {
    const PAGE_SIZE: usize = 0x1000;
    const ONE_GB: usize = 1 << 30;
    const SEARCH_STEP: usize = 0x10000;

    // Try upward
    let mut addr = target & !(SEARCH_STEP - 1);
    let limit_up = addr.saturating_add(ONE_GB - PAGE_SIZE);

    while addr < limit_up {
        let ptr = unsafe {
            VirtualAlloc(
                Some(addr as *const c_void),
                PAGE_SIZE,
                MEM_COMMIT | MEM_RESERVE,
                PAGE_READWRITE,
            )
        };

        if !ptr.is_null() {
            return ptr;
        }

        addr += SEARCH_STEP;
    }

    // Try downward
    let mut addr = target & !(SEARCH_STEP - 1);
    let limit_down = addr.saturating_sub(ONE_GB);

    while addr > limit_down {
        let ptr = unsafe {
            VirtualAlloc(
                Some(addr as *const c_void),
                PAGE_SIZE,
                MEM_COMMIT | MEM_RESERVE,
                PAGE_READWRITE,
            )
        };

        if !ptr.is_null() {
            return ptr;
        }

        addr = addr.saturating_sub(SEARCH_STEP);
    }

    panic!(
        "Failed to allocate a thunk page within ±1GB of {:#x}",
        target
    );
}

When a new memory page is allocated for the use of ADRP instruction, the address of a symbol is written to that new memory page. Therefore, when the LDR instruction loads memory which contains the address of the symbol, that symbol is later referenced. This situation may be described as a pointer-to-pointer scenario.

The final implementation of the Patcher is presented below:

pub struct Patcher {
    relocated_symbols: HashMap<isize, isize>,
}

impl Patcher {
    pub fn new() -> Self {
        Self {
            relocated_symbols: HashMap::new(),
        }
    }

    pub fn patch_instruction(
        &mut self,
        instruction_ptr: *const c_void,
        symbol_ptr: *const c_void,
        relocation_type: RelocationKind,
    ) -> Result<()> {
        let instruction_address = instruction_ptr as isize;
        let symbol_address = symbol_ptr as isize;
        let instruction = unsafe { std::slice::from_raw_parts(instruction_ptr.cast::<u32>(), 1) };

        debug!("Patching {:?} instruction", relocation_type);

        let patched_bytes = match relocation_type {
            RelocationKind::Rva32bit => {
                let val32 = (symbol_address & 0xFFFF_FFFF) as u32;
                [val32]
            }
            RelocationKind::Adrp => {
                self.patch_adrp(instruction[0], instruction_address, symbol_address)
            }
            RelocationKind::AddOffset => self.patch_add(instruction[0], symbol_address),
            RelocationKind::LdrOffset => self.patch_ldr(instruction[0], symbol_address),
        };

        let process_handle = unsafe { GetCurrentProcess() };

        print_memory_at_address(instruction_ptr.cast::<u32>(), 1, Some("Before patching"));

        unsafe {
            WriteProcessMemory(
                process_handle,
                instruction_ptr,
                patched_bytes.as_ptr().cast(),
                4,
                None,
            )
        }?;

        print_memory_at_address(instruction_ptr.cast::<u32>(), 1, Some("After patching"));

        debug!("Instruction successfully patched");

        Ok(())
    }

    fn patch_adrp(
        &mut self,
        instruction: u32,
        instruction_address: isize,
        symbol_address: isize,
    ) -> [u32; 1] {
        const INSTRUCTION_OPCODE: u32 = 0x90000000;
        let rd = instruction & 0b11111;

        let pc_page = (instruction_address as u64) & !0xfff;
        let symbol_page = (symbol_address as u64) & !0xfff;
        let page_delta = (symbol_page as i64 - pc_page as i64) >> 12;

        if page_delta < -(1 << 20) || page_delta >= (1 << 20) {
            debug!("Symbol address is out of ADRP range. Trying to allocate memory in the range of ADRP instruction");

            let thunk_page = allocate_thunk_page_near(instruction_address as usize);

            self.relocated_symbols
                .insert(symbol_address, thunk_page as isize);

            unsafe {
                *(thunk_page as *mut usize) = symbol_address as usize;
            }

            let thunk_page_page = (thunk_page as usize) & !0xfff;

            let thunk_page_delta = (thunk_page_page as i64 - pc_page as i64) >> 12;

            if thunk_page_delta < -(1 << 20) || thunk_page_delta >= (1 << 20) {
                panic!("Thunk page also out of range (should never happen)");
            }

            debug!("Successfully allocated memory for ADRP at {:#X}", thunk_page as isize);

            let imm = thunk_page_delta as u32;
            let imm_lo = (imm & 0b11) << 29;
            let imm_hi = (imm >> 2) << 5;
            let patched_adrp = INSTRUCTION_OPCODE | imm_lo | imm_hi | rd;

            return [patched_adrp];
        }

        let imm = page_delta as u32;
        let imm_lo = (imm & 0b11) << 29;
        let imm_hi = (imm >> 2) << 5;
        let patched_adrp = INSTRUCTION_OPCODE | imm_lo | imm_hi | rd;

        [patched_adrp]
    }

    fn patch_add(&self, instruction: u32, symbol_address: isize) -> [u32; 1] {
        const INSTRUCTION_OPCODE: u32 = 0x91000000;
        let symbol_address = *self
            .relocated_symbols
            .get(&symbol_address)
            .unwrap_or(&symbol_address);

        let rd = instruction & 0b11111;
        let offset = (symbol_address as u64) & 0xfff;

        if offset >= (1 << 12) {
            panic!("ADD immediate out of 12-bit range");
        }

        let imm12 = ((offset & 0xFFF) << 10) as u32;
        let patched_add = INSTRUCTION_OPCODE | imm12 | (rd << 5) | rd;

        [patched_add]
    }

    fn patch_ldr(&self, instruction: u32, symbol_address: isize) -> [u32; 1] {
        const INSTRUCTION_OPCODE: u32 = 0xF9400000;
        let symbol_address = *self
            .relocated_symbols
            .get(&symbol_address)
            .unwrap_or(&symbol_address);

        let rd = instruction & 0b11111;

        let offset = (symbol_address as u64) & 0xfff;
        if offset & 0b111 != 0 {
            panic!("symbol offset must be 8-byte aligned");
        }

        let imm12 = ((offset >> 3) & 0xFFF) as u32;
        let ldr = INSTRUCTION_OPCODE | (imm12 << 10) | (rd << 5) | rd;

        [ldr]
    }
}

To keep track of these internal relocations, the Patcher also contains an internal HashMap that stores all such relocations. This allows the patcher to consider new addresses for symbols that have been relocated during the patching process.

Locating BOF entry point

After loading the BOF into memory and processing all relocations, the BOF's entry point is located so that the launch function can be called.

This is done by the following function:


type LaunchFn = fn(api: &dyn BeaconApi);

impl CoffLoader{ 
    // ...
    pub fn get_entrypoint_function(
        &self,
        function_name: &str,
        sections: &[AllocatedSection],
        coff_binary: &[u8],
    ) -> Result<LaunchFn> {
        let coff = ParsedCoff::new(coff_binary)?;
        let section_delta = coff.section_table.len() - sections.len();

        let entrypoint_symbol_ptr =
            get_symbol_address(function_name, &coff, sections, section_delta)?;

        let launch_fn: LaunchFn =
            unsafe { std::mem::transmute(entrypoint_symbol_ptr as *const ()) };

        Ok(launch_fn)
    }
}

The process of locating the entry point involves calling the get_symbol_address function and converting the received pointer to a function that can be called later by the program. The internal details of the entry point location are implemented in the following function:

pub fn get_symbol_address<'a, 'b>(
    symbol_name: &str,
    coff: &ParsedCoff<'_>,
    sections: &[AllocatedSection],
    section_delta: usize,
) -> Result<*const c_void> {
    for (_idx, symbol) in coff.symbol_table.iter() {
        let name = String::from_utf8_lossy(symbol.name(coff.string_table)?);

        if name == symbol_name {
            let symbol_section_idx = symbol.section_number.get(Default::default()) as usize;
            let allocated_symbol_section = sections
                .get(symbol_section_idx - section_delta - 1)
                .expect("Cannot find allocated section");

            let symbol_offset = symbol.value.get(Default::default()) as usize;

            let symbol_address =
                unsafe { allocated_symbol_section.base_address.add(symbol_offset) };

            return Ok(symbol_address);
        }
    }

    Err(anyhow!(
        "Failed to find address of the {} symbol",
        symbol_name
    ))
}

The function traverses the COFF Symbol Table, searching for a symbol that matches the provided name. Upon locating the symbol, it precisely identifies its location within the corresponding section that has been loaded into memory, and returns a pointer to that symbol.

Preparing entry point arguments

Once the entry point has been located, it's time to prepare its arguments for execution. In my case, the BOF entry point takes only one argument, which is a reference to an object that implements the BeaconApi trait. To facilitate this, I defined a struct that would implement this trait.

use bof_api::BeaconApi;

struct Bof;

impl BeaconApi for Bof {
    fn beacon_output(&self, data: &str) {
        println!("[BOF] {}", data);
    }
}

Calling BOF entry point

The time has come to execute the BOF entry point, marking the culmination of all previous efforts. The complete main function that utilizes all described components is shown below.

fn main() -> Result<()> {
    colog::init();

    let file_to_load = &PathBuf::from("target/bof_payload.o");
    let resolver = Resolver::new_with_preloaded_modules(&["user32.dll", "kernel32.dll"])?;
    let mut coff_loader = CoffLoader::new_with_resolver(resolver);

    info!("Loading file '{}'", file_to_load.display());
    let binary = coff_loader.load_coff_file(&file_to_load)?;
    info!("Object file '{}' was loaded successfully", file_to_load.display());

    info!("Alocating COFF sections");
    let allocated_sections = coff_loader.allocate_sections(&binary)?;
    info!("Successfully allocated all non-zero sized sections");

    info!("Performing COFF relocations");
    coff_loader.handle_relocations(&allocated_sections, &binary)?;
    info!("All relocations were processed successfully");

    info!("Retreiving BOF entry point");
    let entrypoint = coff_loader.get_entrypoint_function("launch", &allocated_sections, &binary)?;
    info!("Entry point found at address: {:#x}", entrypoint as isize);

    let api = Bof;

    wait_for_user("Press ENTER to run BOF")?;

    entrypoint(&api);

    info!("BOF successfully executed. Exiting loader...");

    Ok(())
}

As the process comes to a close, attention turns to cleaning up all remaining artifacts, including any allocated memory. This typically involves deallocating memory through the use of Drop methods within structures such as AllocatedSection.

impl<'a> Drop for AllocatedSection<'a> {
    fn drop(&mut self) {
        unsafe {
            VirtualFree(self.base_address, 0, MEM_RELEASE)
                .expect("Failed to deallocate section in the loader's address space");
        };
    }
}

The following animation illustrates the step-by-step process of launching a BOF file using the loader:

Processing relocations

Summary

In this extensive piece, I have provided a step-by-step guide on how to write a Beacon Object File (BOF) in Rust, mimicking the structure and functionality of a real BOF.

The article delves deeply into the implementation of a COFF (Common Object File Format) loader that transforms an object file into executable code capable of being executed directly in memory.

From a practical standpoint, this article offers a detailed exploration of the COFF format as seen through the lens of constructing a COFF loader. Moreover, it presents solutions to various challenges that emerge when crafting one's own COFF loader.

Although the implementation is not yet complete, as certain aspects of the COFF format were left unexplored, I have striven to provide a comprehensive understanding of what a COFF loader entails and how it can be implemented in Rust. Additionally, I have identified opportunities for improvement, such as adapting the form of the BOF file and minimizing relocations, which will be discussed in subsequent posts.

This article serves as a comprehensive resource for anyone seeking to create their own custom BOF-like files in Rust.