2026-03-19 16:24:27 网络安全文章来源：ZONE.CI 全球网 0 阅读模式

文章总结： 本文研究AI自动化渗透测试平台设计，针对传统测试痛点提出认知驱动与证据为先的理念。核心架构采用P-E-R三角色分离框架与因果图推理机制，结合Agent、Skills、Tools三层解耦设计及MCP协议，有效抑制幻觉并支持经验沉淀。通过深度对比PentAGI、CyberStrikeAI、Strix及鸾鸟等项目，文章总结了架构选型与实现路径，为构建智能、可扩展的渗透测试系统提供了极具价值的工程实践参考。 综合评分： 90 文章分类： 渗透测试,AI安全,安全建设,安全工具

cover_image

AI自动化渗透测试平台设计研究

原创

whoami0002 whoami0002

SecurityPaper

2026年3月17日 21:05 江苏

本文记录了我在设计AI驱动渗透测试平台过程中的思考、调研与架构决策。

一、为什么需要这样一个平台

1.1 传统渗透测试的困境

作为一名安全从业者，我深刻体会到传统渗透测试面临的几大痛点：

效率瓶颈：一次完整的渗透测试可能需要数天甚至数周，大量时间消耗在信息收集、漏洞验证等重复性工作上。安全团队的人力永远跟不上业务系统的扩张速度。

知识断层：安全领域的知识更新极快，新的漏洞类型、攻击手法、绕过技巧层出不穷。即使是资深安全研究员，也难以覆盖所有领域。新手与专家之间存在巨大的能力鸿沟。

覆盖不全：人工测试不可避免地存在盲区，尤其是在大型复杂系统中。疲惫、时间压力、主观判断都可能导致遗漏关键漏洞。

1.2 现有自动化工具的局限

市面上的自动化扫描工具（如AWVS、Nessus、Nuclei）虽然能提升效率，但存在本质缺陷：

规则驱动：依赖预定义的检测规则，面对0day、逻辑漏洞、复杂链式攻击束手无策
误报率高：缺乏智能判断，大量误报消耗人工筛选时间
缺乏上下文理解：无法理解业务逻辑，难以发现认证绕过、权限提升等深层问题
攻击链断裂：各工具孤立运行，无法像真实攻击者一样进行连贯的多步骤攻击

1.3 为什么选择AI Agent架构

大语言模型的出现改变了这一切。GPT-4、Claude等模型展现出的推理能力、代码理解能力、工具调用能力，让我看到了构建”AI渗透测试专家”的可能性。

AI Agent不是简单的规则执行器，而是具备以下能力的智能体：

理解意图：将自然语言描述的测试目标转化为可执行的测试计划
动态决策：根据中间结果调整策略，像人类一样”见招拆招”
知识迁移：将已知的攻击模式迁移到新的场景
自我反思：从失败中学习，避免重复错误

这正是我想要构建的——一个能像真实渗透测试专家一样思考、行动、进化的AI系统。

二、我是如何设计的

2.1 核心设计理念

在动手之前，我确立了三个核心设计原则：

1. 认知驱动，而非脚本驱动

传统的自动化工具本质上是脚本：输入A，执行B，输出C。而AI渗透测试平台应该模拟人类的认知过程：观察（Observation）→ 假设（Hypothesis）→ 验证（Verification）→ 反思（Reflection）。

这对应了认知科学中的OODA循环（Observe-Orient-Decide-Act），也呼应了科学方法论中的假设检验范式。

2. 证据为先，拒绝幻觉

LLM最致命的问题是”幻觉”——一本正经地胡说八道。在安全测试场景中，这是不可接受的。一个不存在的漏洞可能导致严重的资源浪费，甚至错误的修复决策。

因此，我引入了”因果图推理”（Causal Graph Reasoning）机制：每一步攻击都必须建立在确凿证据之上，形成 Evidence → Hypothesis → Vulnerability → Exploit 的可追溯链条。没有证据支撑的假设不能推进到下一步。

3. 架构解耦，能力可扩展

渗透测试涉及的领域极其广泛：Web安全、内网渗透、云安全、二进制逆向、移动安全……没有任何一个团队或个人能精通所有领域。

因此，我将平台设计为”Agent + Skills + Tools”的三层架构：

Agent：负责认知决策的”大脑”
Skills：封装领域知识的”技能包”
Tools：执行具体操作的”手脚”

这样的设计使得新能力的集成变得简单：添加一个新工具只需要写一个MCP配置，添加一个新技能只需要写一个Markdown文档。

2.2 系统架构

基于上述理念，我设计了以下架构：

2.3 关键设计决策

决策1：为什么选择P-E-R三角色分离？

很多AI Agent项目采用单Agent架构，让一个LLM同时负责规划、执行、反思。我在调研后认为这是不可取的。

原因在于”角色冲突”问题：当同一个Agent既要制定宏观战略，又要关注微观执行细节时，会产生认知混乱。就像让一个人同时扮演将军和士兵，结果往往是既制定不出好战略，也执行不好战术。

P-E-R框架的优势：

Planner（规划器）：只需要关注”做什么”，基于全局因果图进行战略决策
Executor（执行器）：只需要关注”怎么做”，专注于单个子任务的工具调用和结果分析
Reflector（反思器）：只需要关注”做得怎样”，进行独立的审计和归因分析

三个角色各司其职，通过事件总线协作，避免了角色混淆。

决策2：为什么需要因果图？

因果图是我从鸾鸟项目中学到的最重要的设计。它解决了一个核心问题：如何防止AI”乱猜”？

传统的AI Agent可能会这样工作：

用户：测试这个网站
AI：我发现了3306端口开放，试试MySQL弱密码...（实际上可能是误报）

而因果图驱动的方式：

Evidence: nmap扫描显示3306/tcp开放，banner为"mysql"
&nbsp; &nbsp; ↓ (confidence: 0.9)
Hypothesis: 目标运行MySQL服务，可能存在弱密码或未授权访问
&nbsp; &nbsp; ↓ (需要验证)
Action: 使用hydra进行密码爆破
&nbsp; &nbsp; ↓ (验证结果)
Vulnerability: 发现root账户使用空密码
&nbsp; &nbsp; ↓ (尝试利用)
Exploit: 成功连接并读取数据

每一步都有证据支撑，每一条边都有置信度评分。如果某一步的置信度过低，系统会要求更多证据而非盲目推进。

决策3：为什么选择MCP作为工具协议？

MCP（Model Context Protocol）是Anthropic提出的工具调用协议，正在成为AI Agent工具集成的事实标准。

选择MCP的原因：

标准化：统一的工具描述格式，无论是nmap还是sqlmap，都通过相同的接口调用
多传输协议：支持HTTP、stdio、SSE，适应不同的部署场景
生态兼容：可以直接集成其他支持MCP的工具（如Claude Desktop、Cursor IDE）
易于扩展：添加新工具只需要写一个简单的配置文件

决策4：Skills与Tools的区别

这是我设计中的一个重要概念区分：

Tools（工具）：原子化的操作能力，如”执行nmap”、”发送HTTP请求”、”运行Python代码”。工具不关心业务逻辑，只负责执行。

Skills（技能）：领域知识的封装，如”SQL注入检测”、”XSS审计”、”云原生安全”。技能包含：

领域知识（什么是SQL注入？有哪些类型？）
检测方法论（如何系统地检测SQL注入？）
工具使用指南（sqlmap的常用参数是什么？）
绕过技巧（如何绕过WAF？）
真实案例（历史漏洞案例分析）

Agent可以在需要时动态加载Skills（通过RAG检索），而不是把所有知识都塞进系统提示词中。

2.4 记忆系统设计

为了让AI能够从经验中学习，我设计了三层记忆系统：

1. 工作记忆（Working Memory）

当前任务的上下文
已执行的操作和结果
活跃的假设和待验证项
生命周期：任务结束即清空

2. 情景记忆（Episodic Memory）

历史攻击记录
成功的攻击路径
失败的尝试和原因
生命周期：持久化存储，支持查询

3. 语义记忆（Semantic Memory）

向量化的知识库
漏洞模式、Payload、绕过技巧
领域文档和最佳实践
生命周期：持续积累，定期更新

关键创新：经验自动沉淀。每次任务完成后，Reflector会自动提取成功的攻击模式和失败教训，写入语义记忆。下次遇到类似场景时，系统会优先检索历史经验。

三、我参考了哪些项目

在设计过程中，我深入调研了四个开源项目。每个项目都给了我不同的启发：

3.1 PentAGI：企业级架构的标杆

项目地址：https://github.com/vxcontrol/pentagi

PentAGI是我见过最完整的AI渗透测试平台，采用了复杂的微服务架构。

架构亮点：

Core Services &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Knowledge Graph &nbsp; &nbsp; &nbsp; &nbsp; Monitoring
┌─────────────┐ &nbsp; &nbsp; &nbsp; ┌─────────────┐ &nbsp; &nbsp; &nbsp; ┌─────────────┐
│ Frontend UI │ &nbsp; &nbsp; &nbsp; │ &nbsp;Graphiti &nbsp; │ &nbsp; &nbsp; &nbsp; │ &nbsp; Grafana &nbsp; │
│ Backend API │ &nbsp; &nbsp; &nbsp; │ &nbsp; Neo4j &nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; │ &nbsp;Victoria &nbsp; │
&nbsp;│ Vector DB &nbsp;│ &nbsp; &nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; │ &nbsp; Jaeger &nbsp; &nbsp;│
│ Task Queue &nbsp;│ &nbsp; &nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; │ &nbsp; &nbsp;Loki &nbsp; &nbsp; │
│ AI Agents &nbsp; │ &nbsp; &nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │
└─────────────┘ &nbsp; &nbsp; &nbsp; └─────────────┘ &nbsp; &nbsp; &nbsp; └─────────────┘

我学到的：

知识图谱的价值：PentAGI使用Graphiti+Neo4j构建知识图谱，能够追踪实体间的语义关系，这比单纯的向量检索更强大
多Agent协作：支持Researcher、Developer、Executor等多种专家角色
链式摘要：智能压缩长对话上下文，避免token溢出
完整的可观测性：集成Prometheus、Grafana、Jaeger，全链路监控

存在的问题：

部署复杂：需要启动十几个容器，配置复杂，对运维要求高
资源消耗大：PostgreSQL + Neo4j + ClickHouse + Redis + …，内存需求至少8GB
启动慢：微服务初始化需要几分钟，不适合快速测试场景
学习曲线陡峭：代码量大（Go后端），理解成本高

对我的启发：PentAGI展示了AI渗透测试平台”能有多强大”，但也让我意识到：对于大多数场景，不需要这么重的架构。应该设计一个可扩展的核心，需要时再集成这些企业级组件。

3.2 CyberStrikeAI：轻量级实现的典范

项目地址：https://github.com/Ed1s0nZ/CyberStrikeAI

CyberStrikeAI采用Go语言开发的单体架构，主打”开箱即用”。

架构亮点：

┌─────────────────────────────────────────┐
│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; CyberStrikeAI (Go) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
│ &nbsp;┌─────────┐ &nbsp;┌─────────┐ &nbsp;┌─────────┐ │
│ &nbsp;│ &nbsp;Web UI │ &nbsp;│ REST API│ &nbsp;│MCP Server│ │
│ &nbsp;└─────────┘ &nbsp;└─────────┘ &nbsp;└─────────┘ │
│ &nbsp;┌─────────────────────────────────────┐│
│ &nbsp;│ &nbsp; &nbsp; &nbsp; &nbsp;Tool Executor (100+) &nbsp; &nbsp; &nbsp; &nbsp; ││
│ &nbsp;│ &nbsp;nmap, sqlmap, nuclei, httpx... &nbsp; &nbsp; ││
│ &nbsp;└─────────────────────────────────────┘│
│ &nbsp;┌─────────────────────────────────────┐│
│ &nbsp;│ &nbsp; &nbsp; &nbsp; &nbsp;SQLite Database &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;││
│ &nbsp;└─────────────────────────────────────┘│
└─────────────────────────────────────────┘

我学到的：

工具定义的优雅性：使用YAML文件定义工具，格式清晰，易于扩展

name: "nmap"
command: "nmap"
args: ["-sT", "-sV", "-sC"]
parameters:
&nbsp; - name: "target"
&nbsp; &nbsp; type: "string"
&nbsp; &nbsp; description: "IP or domain"
&nbsp; &nbsp; required: true
&nbsp; &nbsp; position: 0

Roles系统：预定义渗透测试、CTF、Web扫描等角色，每个角色绑定特定的工具集和技能
Skills系统：将领域知识封装为独立的Skill目录，支持动态加载
MCP的多模式支持：同时支持HTTP、stdio、SSE三种传输协议
知识库集成：向量检索 + 关键词检索的混合模式

存在的问题：

单Agent限制：没有P-E-R这样的多Agent协作，复杂任务的规划能力有限
因果推理缺失：没有因果图机制，依赖LLM自行判断攻击步骤的合理性
记忆系统薄弱：没有长期记忆和经验沉淀机制

对我的启发：CyberStrikeAI证明了”简单架构也能做强大功能”。它的工具定义方式和Skills系统设计非常优雅，我直接借鉴了这些设计。但我也认识到，要实现真正的智能决策，需要更复杂的Agent架构。

3.3 Strix：开发者友好的CLI工具

项目地址：https://github.com/usestrix/strix

Strix是另一个风格完全不同的项目，主打”CLI优先、CI/CD友好”。

架构亮点：

┌─────────────────────────────────────────┐
│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Strix CLI &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
│ &nbsp;┌─────────────────────────────────────┐│
│ &nbsp;│ &nbsp; &nbsp; &nbsp; &nbsp; Agent Engine &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;││
│ &nbsp;│ &nbsp;┌─────────┐ &nbsp;┌─────────────────┐ &nbsp;││
│ &nbsp;│ &nbsp;│ Planner │ &nbsp;│ Tool Executor &nbsp; │ &nbsp;││
│ &nbsp;│ &nbsp;└─────────┘ &nbsp;└─────────────────┘ &nbsp;││
│ &nbsp;└─────────────────────────────────────┘│
│ &nbsp;┌─────────────────────────────────────┐│
│ &nbsp;│ &nbsp; &nbsp; &nbsp; &nbsp; Docker Sandbox &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;││
│ &nbsp;│ &nbsp;• HTTP Proxy &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ││
│ &nbsp;│ &nbsp;• Browser Automation &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ││
│ &nbsp;│ &nbsp;• Python Runtime &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ││
│ &nbsp;└─────────────────────────────────────┘│
└─────────────────────────────────────────┘

我学到的：

Headless模式：支持非交互式运行，输出可解析，适合CI/CD集成
多目标测试：支持同时测试代码仓库和部署的应用（灰盒测试）
自定义指令：通过<font style="color:rgb(31, 9, 9);background-color:rgb(218, 218, 218);">--instruction</font>参数注入专家指导
攻击链可视化：自动生成攻击路径图
浏览器自动化：内置Playwright，适合XSS、CSRF等客户端漏洞检测

存在的问题：

黑盒能力有限：主要面向有源码的场景，纯黑盒测试支持不足
工具生态封闭：没有开放的工具定义接口，难以扩展
单Agent架构：缺乏多Agent协作和反思机制

对我的启发：Strix让我意识到”开发者体验”的重要性。一个好的安全平台不仅要强大，还要易用。它的CLI设计和CI/CD集成思路值得借鉴。

3.4 鸾鸟（LuaN1aoAgent）：认知架构的创新者

项目地址：https://github.com/SanMuzZzZz/LuaN1aoAgent

鸾鸟是四个项目中最具创新性的，提出了P-E-R框架和因果图推理。

架构亮点：

┌─────────────────────────────────────────────────────┐
│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; P-E-R Framework &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
│ &nbsp;┌──────────┐ &nbsp; &nbsp;┌──────────┐ &nbsp; &nbsp;┌──────────┐ &nbsp; &nbsp; &nbsp;│
│ &nbsp;│ Planner &nbsp;│───▶│ Executor │───▶│ Reflector│ &nbsp; &nbsp; &nbsp;│
│ &nbsp;│ 规划器 &nbsp; &nbsp;│ &nbsp; &nbsp;│ 执行器 &nbsp; &nbsp;│ &nbsp; &nbsp;│ 反思器 &nbsp; &nbsp;│ &nbsp; &nbsp; &nbsp;│
│ &nbsp;└──────────┘ &nbsp; &nbsp;└──────────┘ &nbsp; &nbsp;└──────────┘ &nbsp; &nbsp; &nbsp;│
│ &nbsp; &nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
│ &nbsp; &nbsp; &nbsp; └───────────────┴────────────────┘ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;┌────────▼────────┐ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│ &nbsp;Causal Graph &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│ &nbsp;因果图引擎 &nbsp; &nbsp; &nbsp;│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;└─────────────────┘ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
└─────────────────────────────────────────────────────┘

我学到的：

P-E-R框架：这是最核心的启发。Planner-Executor-Reflector三角色分离，避免了单Agent的角色冲突问题
因果图推理：Evidence → Hypothesis → Vulnerability → Exploit 的可追溯链条，有效防止幻觉
Plan-on-Graph：将任务规划建模为动态DAG，支持拓扑排序和并行执行
失败归因：L1-L4四层失败模式分析，从失败中学习
HITL（人机协同）：支持专家在关键决策点介入

存在的问题：

实现复杂度高：Python代码量大，理解成本高
RAG依赖重：必须预先构建知识库索引，首次运行需要较长时间
性能开销：因果图的维护和查询有额外开销
文档不足：部分高级功能的文档较简略

对我的启发：鸾鸟是我参考最多的项目。它的P-E-R框架和因果图设计直接被我采纳。同时，它也让我认识到：高级的认知架构需要付出复杂度的代价，需要在”智能”和”简单”之间找到平衡。

3.5 参考总结

四、各平台的优缺点对比

4.1 架构复杂度 vs 能力对比

能力强度
&nbsp; &nbsp; ▲
&nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;● PentAGI
&nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; /│
&nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; / &nbsp;│
&nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;鸾鸟 ● &nbsp; &nbsp;/ &nbsp; &nbsp;│
&nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; / &nbsp;/ &nbsp; &nbsp; &nbsp;│
&nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/ &nbsp; &nbsp;/ &nbsp; &nbsp; &nbsp; &nbsp;│
&nbsp; &nbsp; │ &nbsp; &nbsp; CyberStrikeAI ● &nbsp; &nbsp; &nbsp; &nbsp;│
&nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \ &nbsp; &nbsp; &nbsp;│
&nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Strix ● &nbsp; &nbsp; \ &nbsp; &nbsp; │
&nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;\ &nbsp; │
&nbsp; &nbsp; └───────────────────────────▶ 架构复杂度
&nbsp; &nbsp; &nbsp; &nbsp;简单 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 复杂

4.2 详细对比表

| 维度 | PentAGI | CyberStrikeAI | Strix | 鸾鸟 | | — | — | — | — | — | | 架构模式 | 微服务 | 单体 | 单体 | 单体 | | Agent架构 | 多Agent | 单Agent | 单Agent | P-E-R三Agent | | 语言 | Go + React | Go | Python | Python | | 工具数量 | 20+ | 100+ | 内置 | 可扩展 | | 知识增强 | 知识图谱+向量 | 向量 | 无 | 向量 | | 因果推理 | 无 | 无 | 无 | 有 | | 反思学习 | 有限 | 无 | 无 | 有 | | 部署难度 | 高 | 低 | 低 | 中 | | 资源需求 | 8GB+ | 2GB | 2GB | 4GB | | CI/CD友好 | 中 | 中 | 高 | 低 | | 黑盒测试 | 强 | 中 | 弱 | 强 | | 白盒测试 | 中 | 中 | 强 | 中 | | 适用场景 | 企业级部署 | 快速部署 | CI/CD集成 | 研究探索 |

4.3 我的取舍

基于对比分析，我决定采用混合策略：

核心架构：采用鸾鸟的P-E-R框架 + 因果图推理
工具系统：采用CyberStrikeAI的YAML配置方式 + MCP协议
技能系统：设计独立的Skills层，支持静态绑定和动态加载
部署方式：单体优先，可选微服务组件（如Neo4j知识图谱）
开发者体验：借鉴Strix的CLI设计和Headless模式

目标：在保持架构清晰的同时，提供灵活的扩展能力。

五、个人心得与思考

5.1 AI不会取代安全工程师，但会改变这个职业

在设计这个平台的过程中，我越来越清晰地认识到：AI Agent不是要取代安全工程师，而是要成为安全工程师的”能力放大器”。

AI擅长的：

处理大量重复性工作（信息收集、端口扫描、目录枚举）
快速检索和应用已知漏洞模式
24/7不知疲倦地工作
并行处理多个目标

人类擅长的：

理解业务逻辑和攻击价值
创造性地组合攻击链
与人沟通（社交工程）
道德判断和合规决策

理想的模式是”AI执行，人类决策”：AI负责具体的测试执行和漏洞验证，人类负责目标设定、优先级排序、结果审核和伦理把关。

5.2 证据驱动是AI安全工具的必由之路

在调试AI Agent的过程中，我遇到过无数次”幻觉”问题：AI自信地声称发现了漏洞，但实际验证时根本不存在。

这让我深刻认识到：在安全领域，AI不能只”猜测”，必须”证明”。

因果图的核心价值在于强制AI遵循”证据→假设→验证”的科学方法论。每一步推进都必须有证据支撑，每一个结论都必须可复现。这虽然会降低执行速度，但大大提高了结果的可信度。

5.3 记忆和经验比工具更重要

在初期设计中，我过于关注”集成了多少工具”。但实际测试发现：工具的数量并不是决定性因素。

一个只配备了nmap和sqlmap的AI，如果拥有丰富的攻击经验库，可能比一个装备了100+工具但没有记忆的AI表现更好。

因为渗透测试的本质是”知识的应用”，而非”工具的堆砌”。知道什么时候用什么工具，比拥有工具本身更重要。

因此，我现在的设计重心转向了：

如何高效地沉淀攻击经验
如何智能地检索历史案例
如何从失败中自动学习

5.4 简单的架构往往更可靠

在参考PentAGI的微服务架构时，我被其”企业级”的设计震撼了。但在实际尝试部署时，我花了整整一天才把所有服务启动起来。

这让我反思：对于大多数安全团队来说，一个能快速部署、易于调试的系统，远比一个功能完备但难以维护的系统更有价值。

因此，我最终的架构选择是：单体核心 + 可选扩展。核心功能可以在一个进程中运行，需要时再接入Neo4j、Prometheus等企业级组件。

5.5 开源的力量

这次调研让我深深感受到开源社区的力量。四个参考项目都开源了核心代码，让我能够深入学习它们的实现细节。

特别是鸾鸟项目，作者在README中详细记录了设计思路和架构决策，对我帮助极大。这种”知识共享”的精神是安全社区最宝贵的财富。

如果我的设计也能对后来者有所启发，那将是最大的回报。

5.6 仍存在的挑战

尽管设计已经比较完整，但我清醒地认识到仍有许多挑战：

技术挑战：

上下文长度限制：复杂渗透测试可能产生海量中间结果，如何压缩和摘要仍是难题
工具调用开销：每次工具调用都需要LLM推理，如何减少不必要的调用
多目标并行：如何在不混淆上下文的前提下并行测试多个目标
实时性：某些场景需要实时响应（如钓鱼网站的交互），如何支持

伦理挑战：

滥用风险：强大的自动化能力也可能被恶意使用，如何防止
边界模糊：渗透测试和攻击的边界在哪里，AI如何判断
责任归属：AI发现漏洞但未报告，或AI执行了超出授权的操作，谁负责

工程挑战：

模型成本：高质量LLM的调用成本仍然较高，如何优化
稳定性：LLM输出的不确定性如何应对
调试困难：AI Agent的行为难以预测，出问题时如何排查

这些问题没有标准答案，需要在实践中不断探索。

参考项目：

PentAGI – https://github.com/vxcontrol/pentagi
CyberStrikeAI – https://github.com/Ed1s0nZ/CyberStrikeAI
Strix – https://github.com/usestrix/strix
鸾鸟（LuaN1aoAgent）- https://github.com/SanMuzZzZz/LuaN1aoAgent

延伸阅读：

MCP Protocol Specification – https://modelcontextprotocol.io
PayloadsAllTheThings – https://github.com/swisskyrepo/PayloadsAllTheThings

免责声明：

本文所载程序、技术方法仅面向合法合规的安全研究与教学场景，旨在提升网络安全防护能力，具有明确的技术研究属性。

任何单位或个人未经授权，将本文内容用于攻击、破坏等非法用途的，由此引发的全部法律责任、民事赔偿及连带责任，均由行为人独立承担，本站不承担任何连带责任。

本站内容均为技术交流与知识分享目的发布，若存在版权侵权或其他异议，请通过邮件联系处理，具体联系方式可点击页面上方的联系我。

本文转载自：SecurityPaper whoami0002 whoami0002《AI自动化渗透测试平台设计研究》