2026-06-18 05:13:34 网络安全文章来源：ZONE.CI 全球网 0 阅读模式

文章总结： 本文介绍了一个基于AI的APP隐私合规自动化检测系统，通过将法律法规评估点流程化处理，结合分层架构和二次验证机制实现隐私政策文本的智能审核。系统支持文件上传和URL抓取，包含28个评估点自动发现、多格式解析、LLM批量评估及结果验证功能，提供可扩展的检测框架和开源实现。 综合评分： 82 文章分类： 安全建设,技术标准,应用安全,解决方案,数据安全

cover_image

app隐私合规AI+实践

原创

三呼呼三呼呼

古月安全

2026年6月15日 13:50 四川

在小说阅读器读本章

去阅读

当前的一份合规的APP隐私政策，需要同时满足《个人信息保护法》、《App违法违规收集使用个人信息行为认定方法》、GB/T 35273《个人信息安全规范》以及各应用商店的审核等细则。仅“收集个人信息的最小必要”一条原则，就会衍生出多种常见违规情形。2023年工信部通报的侵权APP中，超过60%涉及“未公开收集使用规则”或“未明示收集目的”。

那么如何通过AI来实现，如果直接将这些法律法规给大模型，让它根据自己的理解实现，看起来好像是最优解。但是实际使用之后就会发现各种问题，比如误报或者漏报的出现，比如各个法律法规之间的冲突等等。

所以最先要做的就是理清检测的思路

本文从隐私合规的检测方向出发，来实现一个AI agent 自动化的检测app 隐私政策。

实现流程

首先我们要将所有的评估点流程化，变成AI可读懂的内容比如

将所有的需要评估的点全部流程化上面的这样格式

但是不同的大模型可能语义的理解上有偏差，所以我们可以加一些门禁系统。比如在每一个评估点的检测之前，加上示例如下：

最后检测结果的二次校验同样重要，为了防止误报，我们可以在测试结束之后加上不合规的二次校验。比如：

这样就得到一个AI驱动的隐私纯文本的合规检测agent。

我们的实现架构大概是这样：

分层架构：4 层职责分离

┌─────────────────────────────────────────────────────┐│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Web 层 (main.py) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;││ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Flask Web 服务 (端口 8901) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;││ &nbsp; &nbsp; &nbsp; GET &nbsp;/ &nbsp; → 首页（上传文件/URL 输入界面） &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;││ &nbsp; &nbsp; &nbsp; POST /evaluate &nbsp;→ 执行检测并返回 JSON &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│├─────────────────────────────────────────────────────┤│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 核心逻辑层 (agent.py) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;││ &nbsp;┌──────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐ ││ &nbsp;│评估点发现 │ │Prompt构建 │ │LLM 调用 &nbsp;│ │结果验证 │ ││ &nbsp;│扫描文件系 │ │批量/单点 &nbsp;│ │OpenAI兼 &nbsp;│ │二次审核 │ ││ &nbsp;│统自发现 &nbsp; │ │两套模板 &nbsp; │ │容接口 &nbsp; &nbsp;│ │防误报 &nbsp; │ ││ &nbsp;└──────────┘ └──────────┘ └──────────┘ └─────────┘ ││ &nbsp;┌──────────┐ ┌──────────┐ ┌──────────┐ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;││ &nbsp;│URL 抓取 &nbsp;│ │多格式解析│ │子页面发现│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;││ &nbsp;│反爬检测 &nbsp;│ │txt/docx &nbsp;│ │收集清单 &nbsp;│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;││ &nbsp;│HTML清洗 &nbsp;│ │doc/pdf &nbsp; │ │SDK清单 &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;││ &nbsp;└──────────┘ └──────────┘ └──────────┘ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│├─────────────────────────────────────────────────────┤│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 数据层 (目录文件系统) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ││ &nbsp;privacy_test/ &nbsp;← 评估点文件 (28个) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;││ &nbsp;examples/ &nbsp; &nbsp; &nbsp;← 参考示例 (28个, 一一对应) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;││ &nbsp;validations/ &nbsp; ← 验证规则 (28个, 二次校验用) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ││ &nbsp;fetched_docs/ &nbsp;← 保存抓取的隐私政策原文 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │├─────────────────────────────────────────────────────┤│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 前端展示层 (templates/ + static/) &nbsp; &nbsp; &nbsp; &nbsp; ││ &nbsp; &nbsp; &nbsp;单页 HTML + CSS + JS 全功能展示 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ││ &nbsp; &nbsp; &nbsp;合规/不合规筛选, 多文档 Tab 切换 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │└─────────────────────────────────────────────────────┘

核心评估流程

输入(文件/URL)&nbsp; │&nbsp; ├─ URL模式 → fetch_url() 抓取主页面&nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ├─ extract_list_links() 发现子页面&nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │ &nbsp; ├─ "个人信息收集清单" 子页面&nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │ &nbsp; └─ "第三方SDK共享清单" 子页面&nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; └─ save_privacy_doc() 保存原文&nbsp; │&nbsp; └─ 进入评估管线:&nbsp; &nbsp; &nbsp; ①&nbsp;_scan_eval_point_ids() &nbsp;→ 自动发现全部评估点 (28&nbsp;个)&nbsp; &nbsp; &nbsp; ②&nbsp;load_evaluation_points() → 加载评估点内容 + 对应示例&nbsp; &nbsp; &nbsp; ③&nbsp;_build_batch_prompt() &nbsp; &nbsp;→ 构建批量 Prompt (一次 API 调用)&nbsp; &nbsp; &nbsp; ④&nbsp;call_llm() &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; → 大模型评估&nbsp; &nbsp; &nbsp; ⑤&nbsp;_parse_batch_response() &nbsp;→ 解析 JSON, 多层容错&nbsp; &nbsp; &nbsp; ⑥ 缺失补评&nbsp;_evaluate_single() → 批量漏掉的点逐条补齐&nbsp; &nbsp; &nbsp; ⑦ 二次验证&nbsp;_validate_result() &nbsp;→ 加载验证规则审核结果&nbsp; &nbsp; &nbsp; ⑧ 返回 { compliant, non_compliant, 统计 }

使用说明

pip&nbsp;install -r requirements.txt# 依赖: flask>=3.0, openai>=1.0, requests>=2.0, beautifulsoup4>=4.0, python-docx>=1.0# PDF 可选: pip install PyPDF2 pdfplumber# .doc 可选: 安装 antiword (需系统支持)

配置

# 环境变量设置（建议）set&nbsp;LLM_API_KEY=sk-your-api-key &nbsp; &nbsp;# Windowsexport&nbsp;LLM_API_KEY=sk-your-api-key &nbsp;# Linux/Mac
# 可选配置项（有默认值）LLM_BASE_URL &nbsp;# 默认 https://api.deepseek.comLLM_MODEL &nbsp; &nbsp;&nbsp;# 默认 deepseek-v4-flashLLM_MAX_TOKENS&nbsp;# 默认 8192LLM_TEMPERATURE&nbsp;# 默认 0.1SERVER_HOST &nbsp;&nbsp;# 默认 0.0.0.0SERVER_PORT &nbsp;&nbsp;# 默认 8901

启动服务

python main.py#&nbsp;访问 http://localhost:8901

自定义扩展检测点

新增评估点: 在&nbsp;privacy_test/ 下创建&nbsp;评估点29.md，重启服务即可自动发现新增参考示例: 在&nbsp;examples/ 下创建&nbsp;示例29.md（可选）新增验证规则: 在&nbsp;validations/ 下创建&nbsp;验证29.md（可选）子页面检测范围: 修改&nbsp;agent.py:LIST_CHECK_POINT_IDS 默认&nbsp;{7,8,9,10} 即可调整切换模型: 修改&nbsp;config.py 中&nbsp;LLM_MODEL 和&nbsp;LLM_BASE_URL 即可

源码地址：

https://github.com/AnotherN/privacy-text-te/tree/main/test

免责声明：

本文所载程序、技术方法仅面向合法合规的安全研究与教学场景，旨在提升网络安全防护能力，具有明确的技术研究属性。

任何单位或个人未经授权，将本文内容用于攻击、破坏等非法用途的，由此引发的全部法律责任、民事赔偿及连带责任，均由行为人独立承担，本站不承担任何连带责任。

本站内容均为技术交流与知识分享目的发布，若存在版权侵权或其他异议，请通过邮件联系处理，具体联系方式可点击页面上方的联系我。

本文转载自：古月安全三呼呼三呼呼《app隐私合规AI+实践》