2026-04-16 05:33:37 网络安全文章来源：ZONE.CI 全球网 0 阅读模式

文章总结： 本文详细解析了基于FridaGum引擎的ARM64动态指令追踪工具GumTrace的实现与使用。该工具通过C++共享库直接调用Stalker插桩API，规避JavaScript层性能损耗，实现每秒约333MB的高效指令追踪。文章从构建部署、架构设计、插桩原理到离线污点分析全面剖析，并提供Android/iOS双平台实战示例，为移动安全逆向分析提供指令级观测能力。 综合评分： 88 文章分类： 逆向分析,移动安全,安全工具,二进制安全,渗透测试

cover_image

ARM64动态指令追踪工具使用与实现分析

原创

非虫非虫

软件安全与逆向分析

2026年4月14日 10:31 湖北

在小说阅读器读本章

去阅读

ARM64动态指令追踪工具使用与实现分析

本文基于开源项目GumTrace的源码，对ARM64平台动态指令追踪技术进行深度剖析。从工具使用到引擎实现、从指令解析到污点传播，逐层拆解每一处工程细节。

本文项目开源地址为：https://github.com/patchcore-framework/GumTrace

本文作者：非虫（[email protected]）

1 引言

在移动安全研究中，逆向分析师常常面临这样的困境：当分析目标是高度混淆的native代码——例如白盒加密、VM保护或自定义协议实现——静态分析几乎无效，而函数级Hook粒度又太粗，这时候需要的是一台真正的”指令级显微镜”。

传统的trace方案各有短板：

GumTrace走了一条不同的路：它以C++共享库的形式注入目标进程，直接调用Frida Gum引擎的C API进行Stalker插桩，完全绕开JavaScript层，将指令追踪的性能推到了接近极限的水平——项目作者实测每3秒可生成约1GB的trace日志。

本文以GumTrace的源码为蓝本，从使用方法、核心架构、插桩引擎、日志格式、函数识别、平台适配和离线污点分析七个维度，完整呈现ARM64动态指令追踪工具的设计与实现。

2 架构总览

在深入代码细节之前，先建立对GumTrace整体架构的认识。整个系统由三大部分构成：

┌────────────────────────────────────────────────────────────────┐
│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Frida 注入层（JavaScript） &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
│ &nbsp;dlopen(libGumTrace.so) → init() → run() → unrun() &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
└──────────────────────────┬─────────────────────────────────────┘
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│ C ABI
┌──────────────────────────▼─────────────────────────────────────┐
│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;GumTrace 核心引擎（C++） &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
│ &nbsp;┌──────────┐ &nbsp;┌──────────────┐ &nbsp;┌──────────────┐ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │
│ &nbsp;│ GumTrace │ &nbsp;│CallbackContext│ &nbsp;│ FuncPrinter &nbsp;│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │
│ &nbsp;│ 追踪调度 &nbsp;│ &nbsp;│ &nbsp;上下文对象池 &nbsp;│ &nbsp;│ 函数参数打印 &nbsp;│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │
│ &nbsp;└────┬─────┘ &nbsp;└──────────────┘ &nbsp;└──────────────┘ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │
│ &nbsp; &nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
│ &nbsp;┌────▼─────────────────────────────────────────┐ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
│ &nbsp;│ &nbsp; &nbsp; &nbsp; &nbsp;Frida Gum Stalker C API &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
│ &nbsp;│ &nbsp;gum_stalker_follow / transform / callout &nbsp; &nbsp;│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
│ &nbsp;└──────────────────────────────────────────────┘ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
└────────────────────────────────────────────────────────────────┘
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│ trace.log
┌──────────────────────────▼─────────────────────────────────────┐
│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;离线分析工具 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │
│ &nbsp;┌──────────────┐ &nbsp;┌──────────────┐ &nbsp;┌──────────────────┐ &nbsp; &nbsp; &nbsp;│
│ &nbsp;│ TraceParser &nbsp;│ &nbsp;│ TaintEngine &nbsp;│ &nbsp;│ TaintTracker.1sc │ &nbsp; &nbsp; &nbsp;│
│ &nbsp;│ &nbsp;日志解析器 &nbsp; │ &nbsp;│ &nbsp;污点引擎 &nbsp; &nbsp; │ &nbsp;│ 010 Editor 插件 &nbsp;│ &nbsp; &nbsp; &nbsp;│
│ &nbsp;└──────────────┘ &nbsp;└──────────────┘ &nbsp;└──────────────────┘ &nbsp; &nbsp; &nbsp;│
└────────────────────────────────────────────────────────────────┘

设计哲学：GumTrace的设计遵循”追踪时极致性能，分析时离线处理”的原则。追踪阶段只做最必要的信息记录，所有复杂的数据分析（如污点追踪）都推迟到离线阶段完成。

3 快速上手

3.1 构建

GumTrace支持Android和iOS两个平台。构建依赖Frida Gum静态库（已内置于libs/目录），因此只需要标准的交叉编译环境。

Android构建：

# 编辑 build_android.sh，将 ANDROID_NDK_HOME 指向本机的 NDK 路径
vim build_android.sh

./build_android.sh
# 产物: build_android/libGumTrace.so

构建脚本的核心是通过CMake的Android工具链文件配置交叉编译：

cmake .. \
&nbsp; &nbsp; -DCMAKE_TOOLCHAIN_FILE="$ANDROID_NDK_HOME/build/cmake/android.toolchain.cmake"&nbsp;\
&nbsp; &nbsp; -DANDROID_ABI=arm64-v8a \
&nbsp; &nbsp; -DANDROID_PLATFORM=android-24 \
&nbsp; &nbsp; -DCMAKE_BUILD_TYPE=Release

iOS构建：

./build_ios.sh
# 产物: build_ios/libGumTrace.dylib

iOS构建使用Xcode的iphoneos SDK，目标架构为arm64，最低支持iOS 12.0。构建结果是一个动态库（.dylib），由于禁用了代码签名（CODE_SIGNING_ALLOWED=NO），需要在越狱设备上使用。

污点分析工具构建：

cd&nbsp;src/taint
mkdir&nbsp;-p build &&&nbsp;cd&nbsp;build
cmake .. && cmake --build .
# 产物: taint_tracker

3.2 部署与运行

GumTrace以Frida脚本加载的方式注入目标进程。以Android为例，完整的使用流程如下。

第一步：推送共享库到设备

adb push build_android/libGumTrace.so /data/local/tmp/

注意：如果SO加载失败（dlopen返回NULL），通常是SELinux阻止了从/data/local/tmp/加载共享库。需要先关闭SELinux：
adb shell setenforce 0

第二步：编写Frida脚本

GumTrace导出三个C函数：init、run和unrun。通过Frida的dlopen/dlsym加载库并获取函数指针：

let&nbsp;traceSoName =&nbsp;'libGumTrace.so'
let&nbsp;targetSo =&nbsp;'libtarget.so'

let&nbsp;gumtrace_init =&nbsp;null
let&nbsp;gumtrace_run =&nbsp;null
let&nbsp;gumtrace_unrun =&nbsp;null

functionloadGumTrace() {
let&nbsp;dlopen =&nbsp;newNativeFunction(
Module.findGlobalExportByName('dlopen'),&nbsp;'pointer', ['pointer',&nbsp;'int'])
let&nbsp;dlsym =&nbsp;newNativeFunction(
Module.findGlobalExportByName('dlsym'),&nbsp;'pointer', ['pointer',&nbsp;'pointer'])

let&nbsp;soHandle =&nbsp;dlopen(
Memory.allocUtf8String('/data/local/tmp/'&nbsp;+ traceSoName),&nbsp;2)

&nbsp; &nbsp; gumtrace_init =&nbsp;newNativeFunction(
dlsym(soHandle,&nbsp;Memory.allocUtf8String('init')),
'void', ['pointer',&nbsp;'pointer',&nbsp;'int',&nbsp;'pointer'])
&nbsp; &nbsp; gumtrace_run =&nbsp;newNativeFunction(
dlsym(soHandle,&nbsp;Memory.allocUtf8String('run')),&nbsp;'void', [])
&nbsp; &nbsp; gumtrace_unrun =&nbsp;newNativeFunction(
dlsym(soHandle,&nbsp;Memory.allocUtf8String('unrun')),&nbsp;'void', [])
}

functionstartTrace() {
loadGumTrace()

let&nbsp;moduleNames =&nbsp;Memory.allocUtf8String(targetSo)
let&nbsp;outputPath =&nbsp;Memory.allocUtf8String(
'/data/data/com.example.app/trace.log')
let&nbsp;threadId =&nbsp;0// 0 = 当前线程
let&nbsp;options =&nbsp;Memory.alloc(8)
&nbsp; &nbsp; options.writeU64(0)&nbsp;// 0=Stand, 1=DEBUG, 2=Stable

gumtrace_init(moduleNames, outputPath, threadId, options)
gumtrace_run()
}

functionstopTrace() {
gumtrace_unrun()
}

第三步：在目标函数执行期间启动追踪

典型模式是Hook目标函数，在onEnter中启动追踪，在onLeave中停止：

let&nbsp;isTrace =&nbsp;false
functionhook() {
let&nbsp;dlopen_ext =&nbsp;Module.getGlobalExportByName('android_dlopen_ext')
Interceptor.attach(dlopen_ext, {
onEnter(args) {
if&nbsp;(args[0].readCString().indexOf(targetSo) > -1)
this.can&nbsp;=&nbsp;true
&nbsp; &nbsp; &nbsp; &nbsp; },
onLeave() {
if&nbsp;(this.can) {
let&nbsp;targetModule =&nbsp;Process.findModuleByName(targetSo)
Interceptor.attach(targetModule.base.add(0x1234), {
onEnter() {
if&nbsp;(!isTrace) {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; isTrace =&nbsp;true
startTrace()
this.tracing&nbsp;=&nbsp;true
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; },
onLeave() {
if&nbsp;(this.tracing)&nbsp;stopTrace()
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; })
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }
&nbsp; &nbsp; &nbsp; &nbsp; }
&nbsp; &nbsp; })
}

setImmediate(hook)

第四步：运行并拉取日志

frida -U -f com.example.app -l hook.js
# 等待追踪完成后
adb pull /data/data/com.example.app/trace.log .

3.3 iOS平台使用

iOS的使用流程与Android类似，主要差异在路径和库加载方式：

let&nbsp;traceSoName =&nbsp;'libGumTrace.dylib'

// iOS 通过沙盒路径存储日志
functiongetSandboxPath(filename) {
const&nbsp;homePath =&nbsp;ObjC.classes.NSString
&nbsp; &nbsp; &nbsp; &nbsp; .stringWithString_("~")
&nbsp; &nbsp; &nbsp; &nbsp; .stringByExpandingTildeInPath().toString()
return&nbsp;homePath +&nbsp;'/Documents/'&nbsp;+ filename
}

functionloadGumTrace() {
let&nbsp;dlopen =&nbsp;newNativeFunction(
Module.findGlobalExportByName('dlopen'),&nbsp;'pointer', ['pointer',&nbsp;'int'])
let&nbsp;soHandle =&nbsp;dlopen(
Memory.allocUtf8String('/var/jb/var/root/'&nbsp;+ traceSoName),&nbsp;2)
// ... 后续与 Android 相同
}

iOS版本额外支持ObjC消息追踪，能自动拦截objc_msgSend并解析类名、selector以及ObjC对象内容（NSDictionary、NSArray、NSString等）。

3.4 API参考

GumTrace对外暴露三个C接口：

| 接口 | 签名 | 说明 | | — | — | — | | init | void init(const char* module_names, char* trace_file_path, int thread_id, GUM_OPTIONS* options) | 初始化追踪器 | | run | void run() | 启动追踪 | | unrun | void unrun() | 停止追踪 |

init参数详解：

运行模式：

| 模式 | 值 | 行为 | | — | — | — | | Stand | 0 | 标准模式，每20秒刷写一次日志，适合大规模追踪 | | DEBUG | 1 | 调试模式，每20条指令刷写，日志实时可见 | | Stable | 2 | 稳定模式，启用内存范围检查和较高的trust阈值，降低崩溃风险 |

4 日志格式

理解日志格式是后续分析的基础。GumTrace生成的日志是纯文本格式，每条指令占若干行。

4.1 指令行

[模块名] 0x绝对地址!0x相对偏移 助记符 操作数; 寄存器名=值 mem_r=地址 mem_w=地址

分号前是指令本身的信息（模块、地址、反汇编），分号后是运行时状态（寄存器值、内存访问地址）。

4.2 写回行

对于有写操作的指令（如ldr加载、add计算），紧跟一行以->开头的写回行，记录指令执行后目标寄存器的新值：

[libtarget.so] 0x7a3c001890!0x1890 ldr x0, [x1,&nbsp;#0x10]; x1=0x7a3c050000 mem_r=0x7a3c050010
-> x0=0x12345678

这种分行设计使得日志解析器可以精确区分指令执行前后的寄存器状态，为污点分析提供完备的数据流信息。

4.3 函数调用行

当检测到BL/BLR/BR/B指令且跳转目标是已知符号时，生成函数调用记录：

call func: strcmp(0x7a3c050010, 0x7a3c060000)
args0: hello
args1: world
ret: 0xffffffffffffffff

对于JNI调用，格式略有不同：

call jni func: FindClass(0x7a3c000100, 0x7a3c070000)
args1: com/example/MyClass
ret: 0x7a3c080000

4.4 系统调用行

SVC指令触发的系统调用，通过x8寄存器中的系统调用号匹配函数名：

[libtarget.so] 0x7a3c002000!0x2000 svc&nbsp;#0; x8=0x40 ...
call func: openat(0xffffff9c, 0x7a3c090000, 0x0, 0x0)
args1: /proc/self/maps
ret: 0x3

5 初始化流程

init()函数是整个追踪器的启动入口，它完成从引擎创建到模块枚举的全部准备工作。源码位于src/main.cpp。

5.1 Gum引擎与Stalker创建

gum_init();

GumTrace *instance = GumTrace::get_instance();
instance->_stalker = gum_stalker_new();
gum_stalker_set_trust_threshold(instance->_stalker,&nbsp;0);
gum_stalker_set_ratio(instance->_stalker,&nbsp;2);

gum_init()初始化Frida Gum运行时。随后创建Stalker实例——这是Frida的代码追踪引擎，它通过动态重编译（JIT）目标代码来实现插桩。

两个关键参数：

trust_threshold

：设为0表示永不信任已编译的代码块，每次执行都重新编译。这保证了追踪的完整性，但会降低性能。
ratio

：Stalker引擎内部的代码缓存扩展比率。默认值较保守，这里设为2以减少重新分配。

在Stable模式下，这两个参数有不同的取值：

if&nbsp;(instance->options.mode == GUM_OPTIONS_MODE_STABLE) {
&nbsp; &nbsp; gum_process_enumerate_ranges(GUM_PAGE_RW, on_range_found, nullptr);
// ... 排序ranges ...
&nbsp; &nbsp; gum_stalker_set_trust_threshold(instance->_stalker,&nbsp;2);
&nbsp; &nbsp; gum_stalker_set_ratio(instance->_stalker,&nbsp;5);
}

Stable模式提高trust阈值意味着Stalker可以缓存已编译的代码块，减少重复编译的开销。同时枚举所有可读写内存范围，后续在读取字符串和hexdump时进行安全检查，避免访问无效地址导致崩溃。

5.2 目标模块加载

auto&nbsp;module_names_vector = Utils::str_split(module_names,&nbsp;',');
for&nbsp;(constauto&nbsp;&module_name: module_names_vector) {
auto&nbsp;*gum_module = gum_process_find_module_by_name(module_name.c_str());
// ...
&nbsp; &nbsp; gum_module_enumerate_symbols(gum_module, module_symbols_cb, nullptr);
&nbsp; &nbsp; gum_module_enumerate_dependencies(gum_module, module_dependency_cb, nullptr);
// 记录模块基址和大小
&nbsp; &nbsp; module_map["base"] = gum_module_range->base_address;
&nbsp; &nbsp; module_map["size"] = gum_module_range->size;
}

对每个指定模块，GumTrace做三件事：

枚举符号

：遍历模块的符号表，建立地址→函数名的映射（func_maps）。这是后续函数调用识别的基础。
枚举依赖

：递归枚举模块的依赖库符号。这样，当目标模块调用libc的strcmp时，也能正确匹配到符号名。
记录范围

：存储模块的基址和大小，用于快速判断某个PC地址是否属于目标模块。

5.3 模块排除策略

gum_process_enumerate_modules(module_enumerate, nullptr);

在Android上，GumTrace遍历进程的所有模块，将不需要追踪的模块主动排除出Stalker的范围：

if&nbsp;(strncmp(module_path,&nbsp;"/system/",&nbsp;8) ==&nbsp;0&nbsp;||
strncmp(module_path,&nbsp;"/apex/",&nbsp;6) ==&nbsp;0&nbsp;||
strncmp(module_path,&nbsp;"/vendor/",&nbsp;8) ==&nbsp;0&nbsp;||
strstr(module_path,&nbsp;"libGumTrace.so") != nullptr ||
strstr(module_path,&nbsp;".odex") != nullptr ||
strstr(module_path,&nbsp;"memfd") != nullptr) {
&nbsp; &nbsp; gum_stalker_exclude(instance->_stalker, gum_module_range);
}

被排除的模块执行时不会经过Stalker的JIT引擎，直接以原生速度运行。这是GumTrace高性能的关键之一——只对目标模块插桩，系统库全部放行。

在iOS上采用更简洁的相反策略：只对指定模块不排除，其余全部排除。

5.4 JNI环境获取（Android）

auto&nbsp;libart_module = gum_process_find_module_by_name("libart.so");
GumAddress JNI_GetCreatedJavaVMs_addr =
&nbsp; &nbsp; gum_module_find_symbol_by_name(libart_module,&nbsp;"JNI_GetCreatedJavaVMs");
// ... 多重查找策略 ...

auto&nbsp;*jni_get_created_vms =
&nbsp; &nbsp; reinterpret_cast<JNI_GetCreatedJavaVMs_t>(JNI_GetCreatedJavaVMs_addr);
jint result = jni_get_created_vms(vms, vm_count, &vm_count);
if&nbsp;(result == JNI_OK && vm_count >&nbsp;0) {
&nbsp; &nbsp; instance->java_vm = vms[0];
}

为了支持JNI函数追踪，GumTrace在初始化时从libart.so获取JNI_GetCreatedJavaVMs的地址。查找策略有三层回退：先查符号表，再查导出表，最后查全局导出。获取JavaVM后，后续可以通过GetEnv获得JNIEnv指针，进而解析JNI字符串、类名等对象。

5.5 系统调用表初始化

for&nbsp;(constauto& svc_name : svc_names) {
auto&nbsp;svc_name_vector = Utils::str_split(svc_name,&nbsp;' ');
&nbsp; &nbsp; instance->svc_func_maps[std::stoi(svc_name_vector.at(1))] = svc_name_vector.at(0);
}

GumTrace内置了完整的Linux aarch64系统调用表（定义在Utils.cpp中），在初始化时将系统调用号→函数名的映射加载到svc_func_maps中。当追踪到SVC指令时，通过x8寄存器的值查表即可获得系统调用名。

6 Stalker插桩引擎

插桩引擎是GumTrace的心脏，它决定了”在哪里插桩”和”插桩时做什么”。

6.1 Transform回调

当Stalker需要编译一个新的代码块时，会调用transform_callback：

voidGumTrace::transform_callback(GumStalkerIterator *iterator,
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;GumStalkerOutput *output,
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;gpointer user_data){
constauto&nbsp;self =&nbsp;get_instance();
&nbsp; &nbsp; cs_insn *p_insn;
auto&nbsp;*it = iterator;

while&nbsp;(gum_stalker_iterator_next(it, (const&nbsp;cs_insn **) &p_insn)) {
const&nbsp;std::string *module_name_ptr = self->in_range_module(p_insn->address);
if&nbsp;(module_name_ptr ==&nbsp;nullptr) {
gum_stalker_iterator_keep(it);
continue;
&nbsp; &nbsp; &nbsp; &nbsp; }

if&nbsp;(Utils::is_lse(p_insn) ==&nbsp;false) {
auto&nbsp;callback_ctx = self->callback_context_instance->pull(
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; p_insn,&nbsp;gum_stalker_iterator_get_capstone(it),
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; module_name_ptr->c_str(),&nbsp;module.at("base"));

gum_stalker_iterator_put_callout(it, callout_callback,
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; callback_ctx,&nbsp;nullptr);
&nbsp; &nbsp; &nbsp; &nbsp; }

gum_stalker_iterator_keep(it);
&nbsp; &nbsp; }
}

这段代码的执行流程：

逐条迭代

：Stalker将目标代码块的每条ARM64指令通过Capstone反汇编后交给迭代器。
模块过滤

：通过in_range_module检查指令地址是否属于目标模块。不属于的指令直接keep（保留原样）。
原子指令跳过

：LSE（Large System Extensions）原子指令和独占加载/存储指令不能被插桩，否则会破坏原子性导致死锁。
插入callout

：对需要追踪的指令，通过gum_stalker_iterator_put_callout在其前方插入一个回调点。

6.2 模块查找优化

in_range_module使用了一层缓存来加速查找：

const&nbsp;std::string *GumTrace::in_range_module(size_t&nbsp;address){
// 缓存命中——连续指令几乎必然在同一模块
if&nbsp;(last_module_cache.name !=&nbsp;nullptr&nbsp;&&
&nbsp; &nbsp; &nbsp; &nbsp; address >= last_module_cache.base &&
&nbsp; &nbsp; &nbsp; &nbsp; address < last_module_cache.end) {
return&nbsp;last_module_cache.name;
&nbsp; &nbsp; }

// 遍历所有模块
for&nbsp;(constauto&nbsp;&pair: modules) {
constauto&nbsp;&module_map = pair.second;
size_t&nbsp;base = module_map.at("base");
size_t&nbsp;size = module_map.at("size");
size_t&nbsp;end = base + size;
if&nbsp;(address >= base && address < end) {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; last_module_cache = {&pair.first, base, end};
return&nbsp;&pair.first;
&nbsp; &nbsp; &nbsp; &nbsp; }
&nbsp; &nbsp; }
returnnullptr;
}

由于代码的空间局部性，连续执行的指令几乎总是在同一个模块中。CachedModule缓存使得绝大多数查找只需一次比较即可完成。

6.3 原子指令检测

staticboolis_lse(cs_insn *insn);
staticboolis_exclusive_load(cs_insn *insn);

ARM64的原子操作指令包括两类：

LSE原子指令

：ldadd, ldclr, ldset, ldeor, swp, cas等，以及它们的各种宽度变体（b/h/l/al）。
独占加载/存储

：ldxr/stxr、ldaxr/stlxr等成对使用的指令。

这些指令依赖硬件的原子性保证来正确工作。如果在它们之间插入callout回调，会破坏独占监视器（exclusive monitor）的状态，导致无限重试或死锁。GumTrace在transform阶段识别并跳过这些指令，是保障稳定性的关键措施。

7 Callout回调：指令级记录

callout_callback是每条指令执行前调用的核心函数，它完成寄存器值读取、内存地址计算和日志写入。源码位于src/GumTrace.cpp。

7.1 缓冲区管理

char&nbsp;*buff = self->buffer;
int&nbsp;&buff_n = self->buffer_offset;

if&nbsp;(buff_n > BUFFER_SIZE -&nbsp;1024) {
&nbsp; &nbsp; self->trace_file.write(buff, buff_n);
&nbsp; &nbsp; buff_n =&nbsp;0;
}

GumTrace使用一个50MB的内存缓冲区（BUFFER_SIZE = 1024 * 1024 * 50）来减少文件I/O次数。所有的日志内容先写入缓冲区，当剩余空间不足1KB时才一次性刷写到文件。这种批量写入策略极大地降低了系统调用的开销。

7.2 写回寄存器处理

if&nbsp;(self->write_reg_list.num >&nbsp;0) {
for&nbsp;(int&nbsp;i =&nbsp;0; i < self->write_reg_list.num; i++) {
__uint128_t&nbsp;reg_value =&nbsp;0;
if&nbsp;(Utils::get_register_value(self->write_reg_list.regs[i],
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;cpu_context, reg_value)) {
if&nbsp;(i ==&nbsp;0) Utils::append_string(buff, buff_n,&nbsp;"-> ");
// 写入寄存器名和值
&nbsp; &nbsp; &nbsp; &nbsp; }
&nbsp; &nbsp; }
&nbsp; &nbsp; Utils::append_char(buff, buff_n,&nbsp;'\n');
&nbsp; &nbsp; self->write_reg_list.num =&nbsp;0;
}

这段代码利用了一个精妙的时序差：当前指令的callout执行时，前一条指令已经执行完毕。因此可以在当前callout中读取前一条指令的写目标寄存器值。write_reg_list记录了前一条指令的写目标寄存器列表，在当前callout中读取这些寄存器的当前值，就是前一条指令的执行结果。

7.3 操作数解析与内存地址计算

callout回调的核心是一个对Capstone反汇编结果的多分支遍历，根据操作数的访问类型（CS_AC_READ/CS_AC_WRITE）和类型（ARM64_OP_REG/ARM64_OP_MEM）分别处理：

for&nbsp;(int&nbsp;i =&nbsp;0; i < callback_ctx->instruction_detail.arm64.op_count; i++) {
&nbsp; &nbsp; cs_arm64_op &op = callback_ctx->instruction_detail.arm64.operands[i];

if&nbsp;((op.access & CS_AC_READ) && op.type == ARM64_OP_REG) {
// 读寄存器：记录当前值
&nbsp; &nbsp; }
elseif&nbsp;((op.access & CS_AC_WRITE) && op.type == ARM64_OP_MEM) {
// 写内存：计算 base + (index << shift) + disp
uintptr_t&nbsp;shifted_index = Utils::apply_shift(index, op.shift.type,
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; op.shift.value);
uintptr_t&nbsp;write_address = base + shifted_index + op.mem.disp;
// 记录 mem_w=地址
&nbsp; &nbsp; }
elseif&nbsp;((op.access & CS_AC_READ) && op.type == ARM64_OP_MEM) {
// 读内存：同样计算有效地址
// 记录 mem_r=地址
&nbsp; &nbsp; }
elseif&nbsp;((op.access & CS_AC_WRITE) && op.type == ARM64_OP_REG) {
// 写寄存器：加入 write_reg_list，下一条指令的callout读取
&nbsp; &nbsp; }
}

内存地址的计算覆盖了ARM64复杂的寻址模式：base + (index << shift) + displacement。对于后索引（[base], #imm）和预索引（[base, #imm]!）模式，基址寄存器本身也会被更新，因此同时加入写回列表。

7.4 移位计算

ARM64支持多种移位类型，GumTrace通过apply_shift函数完整覆盖：

staticinlineuintptr_tapply_shift(__uint128_t&nbsp;value,
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;arm64_shifter type,
unsignedint&nbsp;amount){
uintptr_t&nbsp;val = (uintptr_t)value;
switch&nbsp;(type) {
case&nbsp;ARM64_SFT_LSL:&nbsp;return&nbsp;val << amount;
case&nbsp;ARM64_SFT_LSR:&nbsp;return&nbsp;val >> amount;
case&nbsp;ARM64_SFT_ASR:&nbsp;return&nbsp;(uintptr_t)((intptr_t)val >> amount);
case&nbsp;ARM64_SFT_ROR:&nbsp;return&nbsp;(val >> amount) | (val << (64&nbsp;- amount));
case&nbsp;ARM64_SFT_MSL:&nbsp;return&nbsp;(val << amount) | ((1ULL&nbsp;<< amount) -&nbsp;1);
default:&nbsp;return&nbsp;val;
&nbsp; &nbsp; }
}

其中MSL（Masked Shift Left）较为少见，它在左移后将低位全部填1，常见于SIMD指令的立即数编码。

8 CallbackContext：对象池设计

每条指令的callout需要一个上下文对象来存储反汇编结果。频繁的malloc/free会严重拖慢性能。GumTrace使用环形对象池解决这个问题。

8.1 预分配策略

#define&nbsp;CALLBACK_CTX_SIZE 102400

CallbackContext::CallbackContext() {
&nbsp; &nbsp; list = (CALLBACK_CTX*)calloc(CALLBACK_CTX_SIZE,&nbsp;sizeof(CALLBACK_CTX));
}

在初始化时一次性分配102400个CALLBACK_CTX对象。每个对象包含完整的Capstone反汇编结果（cs_insn、cs_detail），以及模块名和基址。

8.2 环形复用

CALLBACK_CTX*&nbsp;CallbackContext::pull(const&nbsp;cs_insn* _instruction, csh _handle,
constchar* module_name,
uint64_t&nbsp;module_base){
if&nbsp;(curr_index >= CALLBACK_CTX_SIZE) {
&nbsp; &nbsp; &nbsp; &nbsp; curr_index =&nbsp;0; &nbsp;// 回绕
&nbsp; &nbsp; }

&nbsp; &nbsp; CALLBACK_CTX *ctx = &list[curr_index++];
&nbsp; &nbsp; ctx->handle = _handle;
&nbsp; &nbsp; ctx->module_name = module_name;
&nbsp; &nbsp; ctx->module_base = module_base;
memcpy(&ctx->instruction, _instruction,&nbsp;sizeof(cs_insn));
if&nbsp;(_instruction->detail) {
memcpy(&ctx->instruction_detail, _instruction->detail,&nbsp;sizeof(cs_detail));
&nbsp; &nbsp; }
return&nbsp;ctx;
}

pull从池中取出下一个槽位，用memcpy填充反汇编数据。当索引到达末尾时回绕到0。这个设计的前提是：Stalker编译代码块时分配的callout上下文，在代码块被废弃前不会被覆盖。102400个槽位足够覆盖Stalker的工作窗口。

整个对象池零堆分配，全部操作都是数组索引和memcpy，这是追踪引擎保持高吞吐的基石之一。

9 函数调用识别

GumTrace不只是记录指令，还能自动识别函数调用并打印参数和返回值。这一功能由FuncPrinter类实现。

9.1 跳转目标解析

在callout_callback中，GumTrace检测四种跳转指令：

if&nbsp;(callback_ctx->instruction.id == ARM64_INS_BL &&
&nbsp; &nbsp; callback_ctx->instruction_detail.arm64.operands[0].type == ARM64_OP_IMM) {
&nbsp; &nbsp; jump_addr = callback_ctx->instruction_detail.arm64.operands[0].imm;
}
elseif&nbsp;(callback_ctx->instruction.id == ARM64_INS_BLR &&
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;operands[0].type == ARM64_OP_REG) {
&nbsp; &nbsp; Utils::get_register_value(operands[0].reg, cpu_context, jump_addr);
}
elseif&nbsp;(callback_ctx->instruction.id == ARM64_INS_BR && ...) { ... }
elseif&nbsp;(callback_ctx->instruction.id == ARM64_INS_B && ...) { ... }

BL

：直接调用，立即数操作数就是目标地址。
BLR

：间接调用，从寄存器读取目标地址（常见于虚函数调用、函数指针调用）。
BR

：间接跳转，同样从寄存器读取地址（常见于尾调用优化和跳转表）。
B

：直接跳转，在尾调用场景下等效于函数调用。

获取到jump_addr后，在func_maps中查找匹配的符号名。匹配成功则触发参数打印。

9.2 配置驱动的参数打印

GumTrace使用声明式配置来描述每个已知函数的参数格式：

const&nbsp;std::unordered_map<std::string, BeforeFuncConfig> func_configs = {
// 字符串操作
&nbsp; &nbsp; {"strcmp", {PARAMS_NUMBER_TWO, {STR_INDEX_ZERO, STR_INDEX_ONE}, {}}},
&nbsp; &nbsp; {"strlen", {PARAMS_NUMBER_ONE, {STR_INDEX_ZERO}, {}}},

// 内存操作
&nbsp; &nbsp; {"memcpy", {PARAMS_NUMBER_THREE, {}, {{HEX_INDEX_ONE, HEX_INDEX_TWO}}}},
&nbsp; &nbsp; {"memcmp", {PARAMS_NUMBER_THREE, {},
&nbsp; &nbsp; &nbsp; &nbsp; {{HEX_INDEX_ZERO, HEX_INDEX_TWO}, {HEX_INDEX_ONE, HEX_INDEX_TWO}}}},

// 文件操作
&nbsp; &nbsp; {"open", {PARAMS_NUMBER_TWO, {STR_INDEX_ZERO}, {}}},
&nbsp; &nbsp; {"read", {PARAMS_NUMBER_THREE, {}, {{HEX_INDEX_ONE, HEX_INDEX_TWO}}}},

// 动态链接
&nbsp; &nbsp; {"dlopen", {PARAMS_NUMBER_TWO, {STR_INDEX_ZERO}, {}}},
&nbsp; &nbsp; {"dlsym", &nbsp;{PARAMS_NUMBER_TWO, {STR_INDEX_ONE}, {}}},
// ...
};

BeforeFuncConfig结构体包含：

params_number

：参数个数，决定打印x0到x(n-1)的值。
string_indices

：哪些参数是字符串，需要读取内存内容。
hexdump_indices

：哪些参数对需要hexdump，格式为{地址寄存器索引, 长度寄存器索引}。
special_handler

：特殊处理函数，如syscall需要二次解析。

这种配置驱动的设计使得添加新函数的支持只需一行配置，无需修改打印逻辑。

9.3 内置函数识别范围

GumTrace内置了对以下类别函数的自动解析：

值得注意的是，对于__memcpy_aarch64_simd、__strncmp_aarch64等架构特定的优化变体，GumTrace同样能正确识别。

9.4 返回值捕获

函数的返回值不能在调用前获取，需要等到下一条指令的callout中读取x0。GumTrace通过last_func_context实现这种跨指令的状态传递：

// 调用前：记录函数信息，设置 call = true
self->last_func_context.name = func_maps[jump_addr].c_str();
memcpy(&self->last_func_context.cpu_context, cpu_context,&nbsp;sizeof(GumCpuContext));
self->last_func_context.call =&nbsp;true;
FuncPrinter::before(&self->last_func_context);

// 下一条指令的 callout 中：
if&nbsp;(self->last_func_context.call) {
&nbsp; &nbsp; self->last_func_context.call =&nbsp;false;
&nbsp; &nbsp; FuncPrinter::after(&self->last_func_context, cpu_context);
// 写入返回值信息
}

before在调用发生时打印函数名和参数，after在调用返回后打印返回值（x0）。对于JNI函数，after还会额外解析JNI对象的内容。

10 Android JNI追踪

对于Android逆向来说，JNI函数的追踪能力是GumTrace的一大亮点。

10.1 JNI函数表解析

在获取到JNIEnv指针后，GumTrace遍历JNI函数表建立地址→函数名的映射：

auto&nbsp;jni_func_table = (uint64_t)jni_env->functions;
int&nbsp;index =&nbsp;0;
for&nbsp;(constauto&nbsp;&func_name: jni_func_names) {
auto&nbsp;func_addr_ptr = (void&nbsp;**)(jni_func_table + index *&nbsp;sizeof(void&nbsp;*));
auto&nbsp;func_addr = (uint64_t)(*func_addr_ptr);
&nbsp; &nbsp; jni_func_maps[func_addr] = func_name;
&nbsp; &nbsp; index++;
}

jni_func_names数组包含了所有JNI接口函数的名称（按JNI函数表的顺序排列）。通过指针算术直接从函数表中读取每个函数的实际地址。

10.2 类名和方法名缓存

当追踪到FindClass或GetMethodID调用时，GumTrace缓存返回的jclass和jmethodID与名称的对应关系：

if&nbsp;(strcmp(func_context->name,&nbsp;"FindClass") ==&nbsp;0) {
char&nbsp;jclass_name[1024] = {0};
int&nbsp;jclass_name_n =&nbsp;0;
read_string(jclass_name_n, jclass_name,
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (char*)func_context->cpu_context.x[1]);
&nbsp; &nbsp; instance->jni_classes[curr_cpu_context->x[0]] = jclass_name;
}

后续当CallObjectMethod等函数被调用时，可以通过x1（jclass）和x2（jmethodID）查缓存，直接打印出Java类名和方法名，而不是难以理解的原始指针值。

10.3 JNI字符串解析

对于涉及JNI字符串的函数，GumTrace直接调用JNI API读取字符串内容：

auto&nbsp;jstr = (jstring)(func_context->cpu_context.x[reg_index]);
constchar&nbsp;*cstr = instance->jni_env->GetStringUTFChars(jstr,&nbsp;nullptr);
// 写入日志
instance->jni_env->ReleaseStringUTFChars(jstr, cstr);

这种做法虽然简单直接，但需要注意调用时机——必须在JNI环境有效时执行，否则会导致崩溃。

11 iOS ObjC追踪

iOS平台的GumTrace额外支持Objective-C消息的深度解析。

11.1 objc_msgSend拦截

ObjC的所有方法调用最终都通过objc_msgSend分发。GumTrace拦截这个函数后，解析其两个固定参数：

if&nbsp;(func_name_str ==&nbsp;"objc_msgSend") {
uint64_t&nbsp;selector_ptr = func_context->cpu_context.x[1];
constchar&nbsp;*selector_name =&nbsp;sel_getName((SEL)selector_ptr);
&nbsp; &nbsp; id target = (id)func_context->cpu_context.x[0];
constchar* gotClassName =&nbsp;get_class_name(target);
// 格式化为 [ClassName selectorName]
}

x0

：接收者对象（self）
x1

：selector（方法选择器）

通过ObjC运行时API sel_getName和object_getClassName获取可读的类名和方法名。

11.2 ObjC对象序列化

GumTrace能够将常见的ObjC对象类型序列化为可读格式：

voidFuncPrinter::print_ios_object(int& buff_n,&nbsp;char* buff, id obj,
int&nbsp;indent_level){
if&nbsp;(obj == nil) {&nbsp;/* null */&nbsp;}
constchar&nbsp;*class_name =&nbsp;object_getClassName(obj);

if&nbsp;(strstr(class_name,&nbsp;"Dictionary"))
print_ios_dictionary(buff_n, buff, obj, class_name, indent_level);
elseif&nbsp;(strstr(class_name,&nbsp;"Array"))
print_ios_array(buff_n, buff, obj, class_name, indent_level);
elseif&nbsp;(strstr(class_name,&nbsp;"String"))
print_ios_string(buff_n, buff, obj, class_name, indent_level);
elseif&nbsp;(strstr(class_name,&nbsp;"Data"))
print_ios_data(buff_n, buff, obj, class_name, indent_level);
elseif&nbsp;(strstr(class_name,&nbsp;"Number"))
print_ios_number(buff_n, buff, obj, class_name, indent_level);
// ...
}

NSDictionary被展开为缩进的JSON风格结构，键按字母排序；NSArray展开为列表；NSString打印内容（超过1024字符截断）；NSData执行hexdump；NSNumber打印值和类型标注（int/long/double/float/bool）。

递归调用print_ios_object使得嵌套结构（如Dictionary中包含Array）也能正确展开。

12 寄存器值读取

GumTrace通过Capstone的寄存器ID直接索引Gum的CPU上下文结构体来读取寄存器值。

12.1 通用寄存器

boolUtils::get_register_value(arm64_reg reg, _GumArm64CpuContext *ctx,
__uint128_t&nbsp;&value){
// x0-x28 → ctx->x[0..28]
if&nbsp;(reg >= ARM64_REG_X0 && reg <= ARM64_REG_X28) {
&nbsp; &nbsp; &nbsp; &nbsp; value = ctx->x[reg - ARM64_REG_X0];
returntrue;
&nbsp; &nbsp; }
// w0-w28 → ctx->x[0..28] 的低32位
if&nbsp;(reg >= ARM64_REG_W0 && reg <= ARM64_REG_W28) {
&nbsp; &nbsp; &nbsp; &nbsp; value = (uint32_t)ctx->x[reg - ARM64_REG_W0];
returntrue;
&nbsp; &nbsp; }
// sp, fp(x29), lr(x30), pc, nzcv
// ...
}

12.2 SIMD/浮点寄存器

GumTrace完整支持ARM64的SIMD寄存器系统。q寄存器是128位，d/s/h/b分别是其64/32/16/8位的低位视图：

// q0-q31 → ctx->v[0..31] (128-bit)
if&nbsp;(reg >= ARM64_REG_Q0 && reg <= ARM64_REG_Q31) {
int&nbsp;idx = reg - ARM64_REG_Q0;
memcpy(&value, &ctx->v[idx],&nbsp;sizeof(__uint128_t));
returntrue;
}
// d0-d31 → v[n] 的低 64 位
if&nbsp;(reg >= ARM64_REG_D0 && reg <= ARM64_REG_D31) {
int&nbsp;idx = reg - ARM64_REG_D0;
memcpy(&value, &ctx->v[idx],&nbsp;sizeof(uint64_t));
returntrue;
}

128位值的十六进制格式化通过format_uint128_hex实现，它将__uint128_t拆为高低64位分别输出，跳过前导零以保持日志紧凑。

13 性能工程

GumTrace的设计目标是”每3秒1GB”，为此在多个层面做了性能优化。

13.1 零分配热路径

在callout回调（每条指令执行一次）中，GumTrace完全避免了堆分配：

字符串操作

：全部使用append_string/append_char/append_uint64_hex直接写入预分配缓冲区，不使用std::string或sprintf。
上下文对象

：从预分配的环形池中获取，不调用malloc。
数值格式化

：手写的十六进制转换，逐nibble查表，避免snprintf的重量级实现。

staticinlinevoidappend_uint64_hex(char* buff,&nbsp;int& counter,&nbsp;uint64_t&nbsp;val){
// 手写的零分配十六进制输出，跳过前导零
}

13.2 批量I/O

50MB的内存缓冲区意味着在Standard模式下，文件写入可能每几秒才发生一次。后台线程每20秒执行一次flush：

void*&nbsp;thread_function(void* arg){
while&nbsp;(true) {
&nbsp; &nbsp; &nbsp; &nbsp; instance->trace_file.flush();
usleep(1000&nbsp;*&nbsp;1000&nbsp;*&nbsp;20); &nbsp;// 20秒
&nbsp; &nbsp; }
}

在DEBUG模式下，flush间隔缩短到1毫秒，并且每20条指令触发一次写入，确保日志实时可见（代价是性能显著下降）。

13.3 Stalker排除

如前所述，将系统模块排除出Stalker范围是最重要的性能优化。未被排除的模块中的每条指令都需要经过JIT重编译和callout调用，而排除的模块以原生速度运行。对于典型场景（追踪一个1MB的目标SO），系统库占进程代码的99%以上，排除它们能带来数量级的性能提升。

14 离线污点分析

GumTrace附带了一个独立的离线污点分析工具，可对trace日志进行数据流追踪。这是整个工具链中分析能力最强的组件。

14.1 设计理念

污点分析不在追踪时实时进行，而是作为离线后处理。这样做有两个好处：

追踪阶段不需要承担分析开销，保持最高的记录速度。
分析时可以反复运行不同的查询，不需要重新追踪。

14.2 日志解析器（TraceParser）

TraceParser将文本日志解析为紧凑的二进制表示，每条指令压缩为约64字节的TraceLine结构：

structTraceLine&nbsp;{
int&nbsp;line_number =&nbsp;0;
&nbsp; &nbsp; InsnCategory category = InsnCategory::OTHER;

uint8_t&nbsp;num_dst =&nbsp;0;
uint8_t&nbsp;num_src =&nbsp;0;
&nbsp; &nbsp; RegId dst_regs[4];
&nbsp; &nbsp; RegId src_regs[8];

uint64_t&nbsp;mem_read_addr =&nbsp;0;
uint64_t&nbsp;mem_write_addr =&nbsp;0;
uint64_t&nbsp;mem_write_addr2 =&nbsp;0; &nbsp;&nbsp;// STP 第二个写地址
uint64_t&nbsp;mem_read_addr2 =&nbsp;0; &nbsp; &nbsp;// LDP 第二个读地址
uint64_t&nbsp;rel_addr =&nbsp;0;

bool&nbsp;has_mem_read =&nbsp;false;
bool&nbsp;has_mem_write =&nbsp;false;
bool&nbsp;sets_flags =&nbsp;false; &nbsp; &nbsp; &nbsp; &nbsp;// adds/subs 等隐式写 NZCV

long&nbsp;file_offset =&nbsp;0; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;// 文件偏移，用于回读原始行
int&nbsp;line_len =&nbsp;0;
};

零分配设计：解析器使用数值化的RegId枚举代替字符串存储寄存器名，使用InsnCategory枚举预分类指令类型。寄存器名解析完全手写，不依赖任何字符串库：

RegId&nbsp;TraceParser::parse_reg_name(constchar* s,&nbsp;int&nbsp;len){
switch&nbsp;(s[0]) {
case'x':&nbsp;case'X':
if&nbsp;(len ==&nbsp;2&nbsp;&& s[1] >=&nbsp;'0'&nbsp;&& s[1] <=&nbsp;'9')
return&nbsp;(RegId)(REG_X0 + (s[1] -&nbsp;'0'));
// ...
case'w':&nbsp;case'W':
// w→x 直接归一化
return&nbsp;(RegId)(REG_X0 + n);
&nbsp; &nbsp; }
}

寄存器归一化确保w0和x0被视为同一实体，fp映射为x29，lr映射为x30，d0/s0/h0/b0都归一化为q0。

14.3 指令分类

解析器将ARM64助记符分为12个类别，使得污点引擎可以按类别处理而非逐指令匹配：

14.4 污点传播引擎（TaintEngine）

污点引擎支持正向和反向两种追踪模式。

正向传播：从初始污点源出发，沿执行顺序追踪数据如何被传播和变换。核心规则：

voidTaintEngine::propagate_forward(const&nbsp;TraceLine& line){
switch&nbsp;(line.category) {
case&nbsp;InsnCategory::DATA_MOVE:
case&nbsp;InsnCategory::ARITHMETIC: {
// 源操作数中有污点 → 目标操作数标记污点
// 源操作数全部干净 → 目标操作数清除污点
boolsrc_t&nbsp;=&nbsp;any_src_tainted(line);
for&nbsp;(int&nbsp;i =&nbsp;0; i < line.num_dst; i++) {
if&nbsp;(src_t)&nbsp;taint_reg(line.dst_regs[i]);
elseuntaint_reg(line.dst_regs[i]);
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }
// adds/subs 还会隐式传播到 NZCV
if&nbsp;(line.sets_flags) {
if&nbsp;(src_t)&nbsp;taint_reg(REG_NZCV);
elseuntaint_reg(REG_NZCV);
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }
break;
&nbsp; &nbsp; &nbsp; &nbsp; }
case&nbsp;InsnCategory::LOAD: {
// 内存地址被污染 → 加载到的寄存器标记污点
boolmem_t&nbsp;= line.has_mem_read &&
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;tainted_mem_.count(line.mem_read_addr);
for&nbsp;(int&nbsp;i =&nbsp;0; i < line.num_dst; i++) {
if&nbsp;(mem_t)&nbsp;taint_reg(line.dst_regs[i]);
elseuntaint_reg(line.dst_regs[i]);
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }
break;
&nbsp; &nbsp; &nbsp; &nbsp; }
case&nbsp;InsnCategory::STORE: {
// 数据寄存器被污染 → 写入的内存地址标记污点
boolsrc_t&nbsp;=&nbsp;is_reg_tainted(line.src_regs[0]);
if&nbsp;(src_t) tainted_mem_.insert(line.mem_write_addr);
else&nbsp;tainted_mem_.erase(line.mem_write_addr);
break;
&nbsp; &nbsp; &nbsp; &nbsp; }
case&nbsp;InsnCategory::IMM_LOAD:
// 立即数加载清除目标的污点
for&nbsp;(int&nbsp;i =&nbsp;0; i < line.num_dst; i++)
untaint_reg(line.dst_regs[i]);
break;
// ...
&nbsp; &nbsp; }
}

反向传播：从结果出发，逆执行顺序追溯数据的来源。规则与正向互为镜像：

voidTaintEngine::propagate_backward(const&nbsp;TraceLine& line){
switch&nbsp;(line.category) {
case&nbsp;InsnCategory::ARITHMETIC: {
// 目标寄存器被污染 → 源操作数标记污点
if&nbsp;(any_dst_tainted(line)) {
for&nbsp;(int&nbsp;i =&nbsp;0; i < line.num_dst; i++)
untaint_reg(line.dst_regs[i]);
for&nbsp;(int&nbsp;i =&nbsp;0; i < line.num_src; i++)
taint_reg(line.src_regs[i]);
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }
break;
&nbsp; &nbsp; &nbsp; &nbsp; }
case&nbsp;InsnCategory::STORE: {
// 写入的内存地址被污染 → 数据寄存器标记污点
if&nbsp;(tainted_mem_.count(line.mem_write_addr)) {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; tainted_mem_.erase(line.mem_write_addr);
taint_reg(line.src_regs[0]);
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }
break;
&nbsp; &nbsp; &nbsp; &nbsp; }
// ...
&nbsp; &nbsp; }
}

LDP/STP双操作：污点引擎特别处理了ARM64的成对加载/存储指令。LDP加载两个寄存器，STP存储两个寄存器，它们的两个操作数分别独立追踪：

case&nbsp;InsnCategory::LOAD: {
if&nbsp;(line.has_mem_read2 && line.num_dst >=&nbsp;2) {
// LDP: 两个读地址分别对应两个目标寄存器
bool&nbsp;mem_t1 = tainted_mem_.count(line.mem_read_addr);
bool&nbsp;mem_t2 = tainted_mem_.count(line.mem_read_addr2);
if&nbsp;(mem_t1)&nbsp;taint_reg(line.dst_regs[0]);
elseuntaint_reg(line.dst_regs[0]);
if&nbsp;(mem_t2)&nbsp;taint_reg(line.dst_regs[1]);
elseuntaint_reg(line.dst_regs[1]);
&nbsp; &nbsp; }
}

14.5 污点状态管理

寄存器污点使用256位的布尔数组实现，覆盖所有RegId枚举值。操作是O(1)的：

bool&nbsp;reg_taint_[256] = {};
int&nbsp;tainted_reg_count_ =&nbsp;0;

inlinevoidtaint_reg(RegId id){
auto&nbsp;nid = TraceParser::normalize(id);
if&nbsp;(!reg_taint_[nid]) { reg_taint_[nid] =&nbsp;true; tainted_reg_count_++; }
}

内存污点使用unordered_set<uint64_t>存储，因为被污染的内存地址通常很稀疏。

每次传播事件时，引擎会记录一个包含完整污点快照的ResultEntry，供最终输出使用。

14.6 终止条件

引擎有三种停止条件：

enum classStopReason&nbsp;{
&nbsp; &nbsp; ALL_TAINT_CLEARED, &nbsp; &nbsp;// 所有污点被清除
&nbsp; &nbsp; END_OF_TRACE, &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;// 到达日志末尾（正向）或开头（反向）
&nbsp; &nbsp; SCAN_LIMIT_REACHED &nbsp; &nbsp;// 连续100万行无传播事件
};

SCAN_LIMIT_REACHED防止在长时间无关代码段上浪费计算。默认值100万行可通过set_max_scan_distance调整。

14.7 命令行使用

# 正向追踪：从第100行的x0寄存器开始
./taint_tracker -i trace.log -o result.log -f x0 -l 100

# 反向追踪：从第500行的x0寄存器反向追溯
./taint_tracker -i trace.log -o result.log -b x0 -l 500

# 追踪内存地址
./taint_tracker -i trace.log -o result.log -f mem:0x1000 -l 100

# 按相对地址定位
./taint_tracker -i trace.log -o result.log -f x0 -a 0x1890

# 按字节偏移定位（适合超大日志文件）
./taint_tracker -i trace.log -o result.log -b x0 -p 1048576

反向追踪使用load_range优化——只加载到目标行的数据，避免将整个GB级日志读入内存：

if&nbsp;(mode == TrackMode::BACKWARD && start_line >&nbsp;0) {
&nbsp; &nbsp; parser.load_range(input_file, start_line);
}&nbsp;else&nbsp;{
&nbsp; &nbsp; parser.load(input_file);
}

14.8 010 Editor集成

对于需要交互式分析的场景，GumTrace提供了TaintTracker.1sc脚本，可在010 Editor中直接使用：

在010 Editor中打开trace日志
将光标移到要分析的指令行
运行脚本，选择追踪方向和目标寄存器
脚本自动调用taint_tracker并打开结果文件

脚本通过分析光标所在行自动提取默认的追踪目标（第一个出现的寄存器），并根据光标位置的字节偏移定位起始行，无需手动输入行号。

15 平台适配

GumTrace通过编译时宏实现Android/iOS的平台分离。

15.1 条件编译

// platform.h
#ifdef&nbsp;__APPLE__
#define&nbsp;PLATFORM_IOS 1
#define&nbsp;PLATFORM_ANDROID 0
#else
#define&nbsp;PLATFORM_IOS 0
#define&nbsp;PLATFORM_ANDROID 1
#endif

整个代码库中，#if PLATFORM_ANDROID和#if PLATFORM_IOS控制平台特定的代码路径。核心追踪逻辑（Stalker插桩、操作数解析、缓冲区管理）是完全共享的。

15.2 平台差异对照

| 维度 | Android | iOS | | — | — | — | | 产物 | libGumTrace.so | libGumTrace.dylib | | 日志 | __android_log_print | NSLog | | JNI追踪 | ✅ | — | | ObjC追踪 | — | ✅ | | 模块排除 | 路径前缀匹配 | 仅保留指定模块 | | 构建工具 | NDK CMake工具链 | Xcode iphoneos SDK | | 最低版本 | Android API 24 | iOS 12.0 |

15.3 Frida Gum库

GumTrace链接的是Frida Gum的静态库（版本17.8.3），分为Android和iOS两个变体：

libs/
├── FridaGum-Android-17.8.3-fix.a &nbsp; # Android arm64
├── FridaGum-Android-17.8.3.h &nbsp; &nbsp; &nbsp; # Android 头文件
├── FridaGum-IOS-17.8.3-fix.a &nbsp; &nbsp; &nbsp; # iOS arm64
└── FridaGum-IOS-17.8.3.h &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # iOS 头文件

文件名中的-fix后缀表明这是经过修改的版本，可能针对特定场景做了补丁。静态链接使得产物是自包含的，不依赖设备上的Frida环境。

16 项目结构

GumTrace/
├── CMakeLists.txt &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 主构建脚本（双平台）
├── build_android.sh &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# Android 构建脚本
├── build_ios.sh &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# iOS 构建脚本
├── example.js &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# Android Frida 使用示例
├── example_ios.js &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# iOS Frida 使用示例
├── libs/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Frida Gum 静态库和头文件
└── src/
&nbsp; &nbsp; ├── main.cpp &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 入口：init/run/unrun 导出函数
&nbsp; &nbsp; ├── GumTrace.h/cpp &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 核心引擎：Stalker回调、指令解析
&nbsp; &nbsp; ├── CallbackContext.h/cpp &nbsp; # 上下文对象池（环形缓冲）
&nbsp; &nbsp; ├── FuncPrinter.h/cpp &nbsp; &nbsp; &nbsp; # 函数参数/返回值打印（含JNI和ObjC）
&nbsp; &nbsp; ├── Utils.h/cpp &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 工具函数：寄存器读取、十六进制格式化
&nbsp; &nbsp; ├── platform.h &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 平台检测宏
&nbsp; &nbsp; └── taint/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 离线污点分析工具
&nbsp; &nbsp; &nbsp; &nbsp; ├── CMakeLists.txt
&nbsp; &nbsp; &nbsp; &nbsp; ├── main.cpp &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 命令行入口
&nbsp; &nbsp; &nbsp; &nbsp; ├── TraceParser.h/cpp &nbsp; # 日志解析器（零分配设计）
&nbsp; &nbsp; &nbsp; &nbsp; ├── TaintEngine.h/cpp &nbsp; # 污点传播引擎（正向/反向）
&nbsp; &nbsp; &nbsp; &nbsp; └── TaintTracker.1sc &nbsp; &nbsp;# 010 Editor 交互式脚本

17 总结

GumTrace是一个将性能和功能推到了ARM64平台trace工具极限的项目。回顾其核心设计决策：

C++ Native引擎替代JavaScript

——绕开Frida的JS层，直接调用Gum C API，获得了数量级的性能提升。
Stalker排除策略

——只对目标模块插桩，系统库原生运行，使得trace速度接近实用水平。
环形对象池 + 50MB缓冲区 + 零分配热路径

——将每条指令的处理开销压到最低。
配置驱动的函数识别

——新函数只需一行配置，无需修改打印逻辑。
离线污点分析

——追踪和分析分离，追踪时只记录，分析时可以反复查询。
双向污点追踪

——正向追踪数据去向，反向追溯数据来源，覆盖安全研究的典型需求。

对于安全研究者而言，GumTrace填补了”函数级Hook太粗、硬件trace太难”之间的空白。它让研究者能够在真机上获取指令级的完整执行轨迹，配合Trace UI可视化工具和离线污点分析，构成了一套完整的ARM64动态分析工具链。

免责声明：

本文所载程序、技术方法仅面向合法合规的安全研究与教学场景，旨在提升网络安全防护能力，具有明确的技术研究属性。

任何单位或个人未经授权，将本文内容用于攻击、破坏等非法用途的，由此引发的全部法律责任、民事赔偿及连带责任，均由行为人独立承担，本站不承担任何连带责任。

本站内容均为技术交流与知识分享目的发布，若存在版权侵权或其他异议，请通过邮件联系处理，具体联系方式可点击页面上方的联系我。

本文转载自：软件安全与逆向分析非虫非虫《ARM64动态指令追踪工具使用与实现分析》