2025-12-14 01:20:25 网络安全文章来源：ZONE.CI 全球网 0 阅读模式

文章总结： 这篇文章详细分析了Linux内核CVE-2020-14386漏洞，这是一个存在于af_packet子系统中的内存损坏漏洞，可导致权限提升和Docker逃逸。文章提供了完整的环境搭建步骤，深入分析了漏洞技术细节，特别是在PACKET_RESERVE选项和packet_setsockopt函数中的问题。作者指出漏洞源于tpacket_rcv函数中的计算错误，可能导致macoff溢出，并提出了将此漏洞转变为UAF漏洞的利用思路，通过操纵sctp_shared_key对象的引用计数实现利用。 综合评分： 90 文章分类： 漏洞分析,二进制安全,内核安全,权限提升,容器安全

cover_image

CVE-2020-14386 复现分析

原创

gosh

N0 Fl4g

2025年7月26日 18:52 江苏

梗概

CVE-2020–14386（Linux 内核 af_packet 子系统中的内存损坏漏洞）的发现凸显了网络安全领域持续存在的挑战： 权限提升 、 容器化安全 (docker逃逸) 和内核级漏洞的交叉点。

环境搭建

安装编译依赖：

sudo apt updatesudo apt install -y build-essential libncurses-dev bison&nbsp;flex&nbsp;libssl-dev libelf-dev \&nbsp; &nbsp; bc dwarves zstd git

下载 Linux Kernel 5.7.1 的源码：

wget&nbsp;https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.7.1.tar.xztar&nbsp;-xvf linux-5.7.1.tar.xzcd&nbsp;linux-5.7.1

Linux Kernel 配置选项：

make defconfig
scripts/config \&nbsp; &nbsp; --set-str CONFIG_LOCALVERSION&nbsp;"-cve-2020-14386"&nbsp;\&nbsp; &nbsp; --enable&nbsp;CONFIG_DEBUG_INFO \&nbsp; &nbsp; --enable&nbsp;CONFIG_DEBUG_INFO_BTF \&nbsp; &nbsp; --enable&nbsp;CONFIG_KALLSYMS \&nbsp; &nbsp; --enable&nbsp;CONFIG_TMPFS \&nbsp; &nbsp; --enable&nbsp;CONFIG_TMPFS_XATTR \&nbsp; &nbsp; --enable&nbsp;CONFIG_DEVTMPFS \&nbsp; &nbsp; --enable&nbsp;CONFIG_DEVTMPFS_MOUNT \&nbsp; &nbsp; --enable&nbsp;CONFIG_E1000 \&nbsp; &nbsp; --disable&nbsp;CONFIG_SYSTEM_TRUSTED_KEYS \&nbsp; &nbsp; --disable&nbsp;CONFIG_SYSTEM_REVOCATION_KEYS

编译 Linux Kernel：

# 清除旧编译make cleanmake -j$(nproc) bzImage

然后，利用 create-image.sh 脚本创建一个文件系统的镜像 (bulleye.img)。然后利用 qemu 的 net user mode 作为网络，随后启动 qemu 即可。

启动环境成功后，创建一个普通账户。我这里创建的是 gosh 账户：

adduser goshusermod -aG sudo gosh# 验证groups&nbsp;gosh

Environment

技术细节

PACKET_RESERVE 是一个在 2008 年被引入的网络操作选项。在 man7.org 中对其的描述是：

PACKET_RESERVE (with&nbsp;PACKET_RX_RING)&nbsp; &nbsp; &nbsp;&nbsp;By&nbsp;default, a packet receive ring writes packets&nbsp; &nbsp; &nbsp; immediately following the metadata structure&nbsp;and&nbsp;alignment&nbsp; &nbsp; &nbsp; padding. &nbsp;This&nbsp;integer&nbsp;option reserves additional headroom.

简单来说就是为附加的 headroom 预留一些空间。相关的操作在代码在 packet_setsockopt() 中，可以看到 optlen 一定要为 4 否则不做处理。随后立即调用 copy_from_user() 从用户空间拷贝数据到 val 变量中，因此 val 变量是我们用户可控的。随后使用 lock_sock() 锁住 socket，然后复制版本号。

case PACKET_RESERVE:{&nbsp; &nbsp; unsigned int&nbsp;val;
&nbsp; &nbsp;&nbsp;if&nbsp;(optlen != sizeof(val))&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;return&nbsp;-EINVAL;&nbsp; &nbsp;&nbsp;if&nbsp;(copy_from_user(&val, optval, sizeof(val)))&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;return&nbsp;-EFAULT;&nbsp; &nbsp;&nbsp;if&nbsp;(val&nbsp;> INT_MAX)&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;return&nbsp;-EINVAL;&nbsp; &nbsp; lock_sock(sk);&nbsp; &nbsp;&nbsp;if&nbsp;(po->rx_ring.pg_vec || po->tx_ring.pg_vec) {&nbsp; &nbsp; &nbsp; &nbsp; ret = -EBUSY;&nbsp; &nbsp; }&nbsp;else&nbsp;{&nbsp; &nbsp; &nbsp; &nbsp; po->tp_reserve =&nbsp;val;&nbsp; &nbsp; &nbsp; &nbsp; ret =&nbsp;0;&nbsp; &nbsp; }&nbsp; &nbsp; release_sock(sk);&nbsp; &nbsp;&nbsp;return&nbsp;ret;}

通过进一步调试，我们发现后续的调用流程会来到 packet_set_ring()。如下图所示：

call packet_set_ring()

gef>&nbsp;register$rcx &nbsp; : 0x0000000000000000$rdx &nbsp; : 0x0000000000000000$rsp &nbsp; : 0xffffc90000213e40 &nbsp;-> &nbsp;0x0000000000000000$rbp &nbsp; : 0xffffc90000213ee8 &nbsp;-> &nbsp;0x0000000000000010$rsi &nbsp; : 0xffffc90000213e70 &nbsp;-> &nbsp;0x0000000100800000$rdi &nbsp; : 0xffff8881394ed000 &nbsp;-> &nbsp;0x0000000000000000$rip &nbsp; : 0xffffffff819fb092 <packet_set_ring+0x2> &nbsp;-> &nbsp;0x544155415641c889

调试发现 (int)req->tp_block_size 大小为 0x00800000，因此下述语句不会成功：

if&nbsp;(unlikely((int)req->tp_block_size <=&nbsp;0))&nbsp; &nbsp;&nbsp;goto&nbsp;out;if&nbsp;(unlikely(!PAGE_ALIGNED(req->tp_block_size)))&nbsp; &nbsp;&nbsp;goto&nbsp;out;

接着程序来到如下代码片段：

min_frame_size&nbsp;= po->tp_hdrlen + po->tp_reserve;

这里主要是在计算最小帧，调试发现 po->tp_hdrlen 大小为 0x34。然后来到如下的判断，其中 req->tp_frame_size 大小为 0x00011000。

if&nbsp;(unlikely(req->tp_frame_size < min_frame_size))&nbsp; &nbsp;&nbsp;goto&nbsp;out;if&nbsp;(unlikely(req->tp_frame_size & (TPACKET_ALIGNMENT -&nbsp;1)))&nbsp; &nbsp;&nbsp;goto&nbsp;out;

然后，经过一些块的计算过程来到了分配环节：

order&nbsp;= get_order(req->tp_block_size);pg_vec&nbsp;= alloc_pg_vec(req, order);

其中 alloc_pg_vec() 实现如下：

static&nbsp;char&nbsp;*alloc_one_pg_vec_page(unsigned&nbsp;long&nbsp;order){char&nbsp;*buffer;gfp_t&nbsp;gfp_flags = GFP_KERNEL | __GFP_COMP |             &nbsp;__GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY;
    buffer = (char&nbsp;*) __get_free_pages(gfp_flags, order);if&nbsp;(buffer)return&nbsp;buffer;
/* __get_free_pages failed, fall back to vmalloc */    buffer =&nbsp;vzalloc(array_size((1&nbsp;<< order), PAGE_SIZE));if&nbsp;(buffer)return&nbsp;buffer;
/* vmalloc failed, lets dig into swap here */    gfp_flags &= ~__GFP_NORETRY;    buffer = (char&nbsp;*) __get_free_pages(gfp_flags, order);&nbsp;// order = 0xbif&nbsp;(buffer)return&nbsp;buffer;
/* complete and utter failure */return&nbsp;NULL;}

简单来说，alloc_one_pg_vec_page函数会使用__get_free_pages来分配block页面。分配block后，pg_vec数组被保存到packet_ring_buffer结构中，嵌入在packet_sock结构中，该结构用来表示套接字。

漏洞分析

在 “技术细节” 中，我们了解到了 set 流程是如何分配内存的。那么在缺陷功能 tpacket_rcv() 我们才能有的放矢。通过调试，走了一堆不知道干嘛的流程。最终发现 maclen 大小为 0xe。如果套接字上设置了PACKET_VNET_HDR选项，就会在其中添加sizeof(struct virtio_net_hdr) (0xa)，以处理virtio_net_hdr结构，该结构应该位于以太网头之后。最后，代码会计算以太网头偏移值，保存到macoff中。

if&nbsp;(sk->sk_type == SOCK_DGRAM) {&nbsp; &nbsp; macoff = netoff = TPACKET_ALIGN(po->tp_hdrlen) +&nbsp;16&nbsp;+&nbsp; &nbsp; &nbsp; &nbsp; po->tp_reserve;}&nbsp;else&nbsp;{&nbsp; &nbsp;&nbsp;unsigned&nbsp;int&nbsp;maclen = skb_network_offset(skb);&nbsp; &nbsp; netoff = TPACKET_ALIGN(po->tp_hdrlen +&nbsp; &nbsp; &nbsp; &nbsp; (maclen <&nbsp;16&nbsp;?&nbsp;16&nbsp;: maclen)) +&nbsp; &nbsp; &nbsp; &nbsp; po->tp_reserve;&nbsp; &nbsp;&nbsp;if&nbsp;(po->has_vnet_hdr) {&nbsp; &nbsp; &nbsp; &nbsp; netoff +=&nbsp;sizeof(struct&nbsp;virtio_net_hdr);&nbsp; &nbsp; &nbsp; &nbsp; do_vnet =&nbsp;true;&nbsp; &nbsp; }&nbsp; &nbsp; macoff = netoff - maclen;&nbsp;// 可能存在溢出}

随后如图8所示，代码会使用virtio_net_hdr_from_skb函数，将virtio_net_hdr结构写入环形缓冲区中，其中h.raw指向的是环形缓冲区中当前空闲的帧（环形缓冲区在alloc_pg_vec中分配）。

if&nbsp;(do_vnet && virtio_net_hdr_from_skb(skb, h.raw + macoff –&nbsp;sizeof(struct&nbsp;virtio_net_hdr), vio_le(),&nbsp;true,&nbsp;0))&nbsp; &nbsp;&nbsp;goto&nbsp;drop_n_account;

本来，我们可以控制 po->tp_reserve 来溢出 netoff 使得 macoff 很大。但实际在调用上述函数之前，回先执行如下的检验。这就导致 macoff 很大会被检测出来。

if&nbsp;(macoff + snaplen > po->rx_ring.frame_size) {&nbsp; &nbsp;&nbsp;if&nbsp;(po->copy_thresh &&&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; atomic_read(&sk->sk_rmem_alloc) < sk->sk_rcvbuf) {&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;if&nbsp;(skb_shared(skb)) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;copy_skb&nbsp;= skb_clone(skb, GFP_ATOMIC);&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp;else&nbsp;{&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;copy_skb&nbsp;= skb_get(skb);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;skb_head&nbsp;= skb->data;&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;if&nbsp;(copy_skb)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; skb_set_owner_r(copy_skb, sk);&nbsp; &nbsp; }&nbsp; &nbsp;&nbsp;snaplen&nbsp;= po->rx_ring.frame_size - macoff;&nbsp; &nbsp;&nbsp;if&nbsp;((int)snaplen <&nbsp;0) {&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;snaplen&nbsp;=&nbsp;0;&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;do_vnet&nbsp;=&nbsp;false;&nbsp; &nbsp; }}

但是 h.raw + macoff – sizeof(struct virtio_net_hdr) 这一过程仍然可能是存在问题的。我们只需要让 macoff < sizeof(struct virtio_net_hdr) 就可以发生上抬 h.raw。这种情况是存在，我们同样可以操纵 netoff 来实现这一点。此前，我们调试的时候发现 struct virtio_net_hdr 的大小为 0xA(10) 字节。因此我们最多溢出 10 字节，并且是 10 个 0 字节：

static&nbsp;inline&nbsp;int&nbsp;virtio_net_hdr_from_skb(const&nbsp;struct&nbsp;sk_buff *skb,&nbsp; &nbsp;&nbsp;struct&nbsp;virtio_net_hdr *hdr,&nbsp; &nbsp;&nbsp;bool&nbsp;little_endian,&nbsp; &nbsp;&nbsp;bool&nbsp;has_data_valid,&nbsp; &nbsp;&nbsp;int&nbsp;vlan_hlen){&nbsp; &nbsp;&nbsp;memset(hdr,&nbsp;0,&nbsp;sizeof(*hdr));&nbsp;/* no info leak */&nbsp; &nbsp;&nbsp;if&nbsp;(skb_is_gso(skb)) {&nbsp; &nbsp;&nbsp;// …&nbsp; &nbsp;&nbsp;if&nbsp;(skb->ip_summed == CHECKSUM_PARTIAL) {&nbsp; &nbsp;&nbsp;// …

此时如果其他 buffer 什么都没有申请过。那么我们很可能会触发 pages fault，导致 kernel 引发 panic 进而崩溃。

call page_fault

利用思路

利用的思路是将原语变为UAF，为了达到这个目的，考虑将一些对象的引用计数减少，例如，一个对象的引用计数是0x10001。如果发生了前向溢出，引用计数将会变为 1，再经历一次释放后，对象将会被free。但为了使其发生，需要满足下列条件：

refcount需要在对象的最后1～10字节
需要对象被分配在页的末尾：
因为get_free_pages返回页对齐的地址

经过分析，下面这个对象满足条件：

struct&nbsp;sctp_shared_key&nbsp;{&nbsp; &nbsp;&nbsp;struct&nbsp;list_head&nbsp;key_list;&nbsp; &nbsp;&nbsp;struct&nbsp;sctp_auth_bytes&nbsp;*key;&nbsp; &nbsp;&nbsp;refcount_t&nbsp;refcnt;&nbsp; &nbsp; __u16 key_id;&nbsp; &nbsp; __u8 deactivated;};

看起来这个对象满足我们的条件限制：