一行代码让网页拥有 AI 大脑：阿里 Page-Agent 实战完全指南

2026 年初，阿里巴巴开源的 Page-Agent 在 GitHub 上引发了轰动。这款仅有 10K+ Stars 的项目，凭借其独特的「页面内置 GUI 智能体」定位，迅速成为前端开发者社区的焦点。

你是否曾经面对过这样的场景？公司的 ERP 系统需要填写 30 个字段的表单，或者需要在多个页面之间来回切换复制粘贴数据，又或者希望给产品添加一个 AI 助手却不想重构后端。

Page-Agent 正是为解决这些痛点而生。它不需要 Python、不需要无头浏览器、更不需要复杂的服务器部署，只需要一行 <script> 标签，就能让你的网页拥有自然语言控制能力。

本文将带你全面掌握 Page-Agent，从快速入门到生产环境部署，从基础用法到高级配置，手把手教你将 AI 能力融入你的 Web 应用。

一、Page-Agent 是什么

Page-Agent 是阿里巴巴开源的一款「运行在网页内部的 GUI 智能体」。它的核心能力是：将用户的自然语言指令转化为网页上的实际操作，比如点击按钮、填写表单、滚动页面等。

1.1 核心定位

与传统浏览器自动化工具不同，Page-Agent 完全运行在浏览器端。它不是需要额外安装的 Chrome 插件，也不是笨重的桌面级 RPA 工具，而是一段可以直接嵌入网页的 JavaScript 代码。

当你在这段代码中引入大模型的能力后，用户只需要用自然语言描述想要执行的操作，Page-Agent 就会自动分析页面结构、规划操作步骤、执行相应动作。

1.2 三大核心特性

零基建集成

Page-Agent 不需要任何后端服务支持。一行 <script> 标签即可完成接入，无需 Python 环境、无需 Docker 容器、更无需部署服务器。这意味着你可以直接在任何现有的网页中引入它，无需对后端进行任何改造。

隐私优先

所有的数据处理都在浏览器内部完成。页面结构发送给大模型之前，会经过本地脱敏处理，密码框内容、敏感财务数据等都会被自动遮蔽。对于企业级应用来说，这一点至关重要。

人机协同

Page-Agent 采用了 Human-in-the-loop 设计理念。对于高风险操作（比如提交表单、删除数据），AI 会先暂停并请求用户确认，而不是盲目执行到底。这种设计既保证了效率，又确保了安全。

1.3 技术架构概述

Page-Agent 的架构分为三个核心部分：

Agent Core（端侧执行器）：直接运行在宿主网页的 JavaScript 线程中，负责与底层 DOM 交互。它能精确计算元素坐标，模拟人类的点击、输入、滚动等操作。

DOM Parser（脱水引擎）：这是 Page-Agent 的核心创新点。它不会把整个页面的 HTML 发送给大模型，而是先进行「脱水处理」——过滤掉所有用于排版的 <div> 和 <span> 标签，只保留真正可交互的元素（按钮、输入框、链接等）。这大大减少了 Token 消耗，提高了响应速度。

LLM Client（可插拔大脑）：支持对接任意兼容 OpenAI API 格式的大模型，包括 OpenAI、Claude、通义千问、DeepSeek，或者本地部署的 Ollama。

二、快速开始：十分钟入门

Page-Agent 提供了两种接入方式：CDN 快速体验和 NPM 生产部署。我们先从最简单的 CDN 方式开始。

2.1 CDN 快速体验

如果你只是想先体验一下功能，不需要任何配置，直接在 HTML 中引入即可：

<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Page-Agent 快速体验</title>
</head>
<body>
    <h1>欢迎体验 Page-Agent</h1>
    <p>请在右下角的输入框中输入自然语言指令</p>
    
    <!-- 引入 Page-Agent（使用官方 Demo 版本） -->
    <script src="https://cdn.jsdelivr.net/npm/page-agent@1.5.8/dist/iife/page-agent.demo.js" crossorigin="true"></script>
</body>
</html>

将上述代码保存为index.html，然后用浏览器打开。你会看到页面右下角出现了一个悬浮的对话框，这就是 Page-Agent 的交互界面。

在输入框中尝试输入：「帮我找到这个页面上的标题文字并高亮显示」，你会看到 Page-Agent 开始分析页面结构并执行相应操作。

⚠️ 注意：Demo 版本使用的是阿里云提供的免费测试 API，仅供技术评估使用，不建议在生产环境中使用。

2.2 NPM 生产环境部署

对于正式的项目，建议使用 NPM 方式安装，这样可以获得更好的类型提示和版本控制：

1	npm install page-agent

安装完成后，在代码中引入并初始化：

import { PageAgent } from 'page-agent'

// 初始化 Page-Agent
const agent = new PageAgent({
    model: 'qwen3.5-plus',
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
    apiKey: 'YOUR_API_KEY', // 替换为你的 API Key
    language: 'zh-CN',
})

// 执行自然语言指令
await agent.execute('点击页面上的登录按钮')

2.3 一个完整的示例

让我们创建一个更完整的示例，包含基本的交互按钮：

<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Page-Agent 完整示例</title>
    <style>
        body {
            font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
            max-width: 800px;
            margin: 50px auto;
            padding: 20px;
        }
        .form-container {
            background: #f5f5f5;
            padding: 30px;
            border-radius: 8px;
        }
        .form-group {
            margin-bottom: 20px;
        }
        label {
            display: block;
            margin-bottom: 8px;
            font-weight: 500;
        }
        input[type="text"],
        input[type="email"],
        input[type="tel"] {
            width: 100%;
            padding: 10px;
            border: 1px solid #ddd;
            border-radius: 4px;
            box-sizing: border-box;
        }
        button {
            background: #007bff;
            color: white;
            padding: 12px 30px;
            border: none;
            border-radius: 4px;
            cursor: pointer;
            font-size: 16px;
        }
        button:hover {
            background: #0056b3;
        }
        #ai-trigger {
            position: fixed;
            bottom: 20px;
            right: 20px;
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            color: white;
            width: 60px;
            height: 60px;
            border-radius: 50%;
            border: none;
            font-size: 24px;
            cursor: pointer;
            box-shadow: 0 4px 15px rgba(102, 126, 234, 0.4);
            transition: transform 0.2s;
        }
        #ai-trigger:hover {
            transform: scale(1.1);
        }
    </style>
</head>
<body>
    <h1>用户注册表单</h1>
    
    <div class="form-container">
        <form id="register-form">
            <div class="form-group">
                <label for="username">用户名</label>
                <input type="text" id="username" name="username" placeholder="请输入用户名">
            </div>
            
            <div class="form-group">
                <label for="email">邮箱</label>
                <input type="email" id="email" name="email" placeholder="请输入邮箱">
            </div>
            
            <div class="form-group">
                <label for="phone">手机号</label>
                <input type="tel" id="phone" name="phone" placeholder="请输入手机号">
            </div>
            
            <div class="form-group">
                <label for="company">公司名称</label>
                <input type="text" id="company" name="company" placeholder="请输入公司名称">
            </div>
            
            <button type="submit" id="submit-btn">提交注册</button>
        </form>
    </div>

    <!-- 触发 Page-Agent 的按钮 -->
    <button id="ai-trigger" title="AI 助手">🤖</button>

    <!-- 引入 Page-Agent -->
    <script type="module">
        import { PageAgent } from 'page-agent'
        
        // 初始化 Agent
        const agent = new PageAgent({
            model: 'qwen3.5-plus',
            baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
            apiKey: 'YOUR_API_KEY', // 替换为你的 API Key
            language: 'zh-CN',
            ui: {
                visible: false, // 隐藏默认 UI，使用自定义按钮触发
            }
        })
        
        // 点击 AI 按钮时触发
        document.getElementById('ai-trigger').addEventListener('click', async () => {
            const userInput = prompt('请描述你想要执行的操作：')
            if (userInput) {
                await agent.execute(userInput)
            }
        })
    </script>
</body>
</html>

在这个示例中，我们创建了一个注册表单页面，并通过右下角的 🤖 按钮来触发 Page-Agent。用户可以输入诸如「帮我填写用户名张三、邮箱 zhangsan@example.com、手机号 13800138000、公司名称阿里巴巴」这样的自然语言指令，Page-Agent 会自动完成表单填写。

三、深入配置：打造你的专属 Page-Agent

Page-Agent 提供了丰富的配置选项，可以满足不同场景的需求。本章将详细介绍各个配置项的作用和用法。

3.1 基础配置

以下是 Page-Agent 的完整配置项说明：

const agent = new PageAgent({
    // 必需：选择你要使用的大模型
    model: 'qwen3.5-plus',
    
    // 必需：API 端点地址
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
    
    // 必需：你的 API Key
    apiKey: process.env.VITE_LLM_API_KEY,
    
    // 可选：界面语言，默认 'en-US'
    language: 'zh-CN',
    
    // 可选：系统提示词，用于定制 AI 的行为
    systemPrompt: '你是一个友好的网页助手，帮助用户完成表单填写等操作。',
})

3.2 UI 配置

Page-Agent 自带一个悬浮的对话 UI，你可以通过配置来控制它的行为和外观：

const agent = new PageAgent({
    // ... 其他配置
    
    ui: {
        // 是否显示默认的 UI 面板
        visible: true,
        
        // UI 面板的位置：'bottom-right', 'bottom-left', 'top-right', 'top-left'
        position: 'bottom-right',
        
        // 主题颜色：'light', 'dark' 或自定义颜色
        theme: 'light',
        
        // 自定义样式类名
        className: 'my-custom-agent',
        
        // 是否显示操作过程的可视化（高亮当前操作的元素）
        showActions: true,
    }
})

3.3 安全配置（生产环境必读）

数据安全是企业应用的重中之重。Page-Agent 提供了完善的数据脱敏机制：

const agent = new PageAgent({
    // ... 其他配置
    
    security: {
        // 是否启用数据脱敏
        masking: {
            enabled: true,
            
            // 根据 CSS 选择器屏蔽特定元素
            maskedSelectors: [
                '.password-input',      // 密码框
                '.credit-card',        // 信用卡号
                '.bank-account',       // 银行账号
                '#secret-field',       // 任何 ID 包含 secret 的字段
                '[data-sensitive]',    // 自定义敏感属性
            ],
            
            // 正则表达式模式匹配
            regexPatterns: [
                // 匹配 11 位手机号
                { pattern: /\b1[3-9]\d{9}\b/g, replacement: '[手机号已屏蔽]' },
                // 匹配邮箱
                { pattern: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g, replacement: '[邮箱已屏蔽]' },
                // 匹配身份证号
                { pattern: /\b[1-9]\d{5}(19|20)\d{2}(0[1-9]|1[0-2])(0[1-9]|[12]\d|3[01])\d{3}[\dXx]\b/g, replacement: '[身份证号已屏蔽]' },
                // 匹配金额（以元为单位）
                { pattern: /¥\d+(,\d{3})*(\.\d{1,2})?/g, replacement: '[金额已屏蔽]' },
            ],
        },
        
        // 操作确认配置
        confirmation: {
            // 需要确认的操作类型
            requireConfirmationFor: ['submit', 'delete', 'confirm'],
            
            // 是否默认要求确认
            defaultRequireConfirm: true,
        }
    }
})

3.4 动作配置

你可以控制 Page-Agent 执行动作的方式和频率：

const agent = new PageAgent({
    // ... 其他配置
    
    action: {
        // 每个动作之间的延迟（毫秒），模拟人类操作节奏
        delayBetweenActions: 500,
        
        // 动作超时时间（毫秒）
        actionTimeout: 30000,
        
        // 是否在执行动作前请求确认
        requireConfirmation: true,
        
        // 最大重试次数
        maxRetries: 3,
        
        // 是否在页面上显示操作步骤
        showStepIndicator: true,
    }
})

3.5 完整配置示例

以下是一个生产环境的完整配置示例：

import { PageAgent } from 'page-agent'

// 创建 Page-Agent 实例
const createPageAgent = (apiKey) => {
    return new PageAgent({
        // 模型配置
        model: 'deepseek-chat',
        baseURL: 'https://api.deepseek.com/v1',
        apiKey: apiKey,
        language: 'zh-CN',
        
        // 系统提示词
        systemPrompt: `你是一个专业的网页助手，专门帮助用户填写表单。
请遵循以下规则：
1. 只操作页面上的表单元素，不要尝试访问外部链接
2. 填写前先确认字段的占位符或标签
3. 如果遇到不确定的情况，先询问用户
4. 完成后向用户确认操作结果`,
        
        // UI 配置
        ui: {
            visible: true,
            position: 'bottom-right',
            theme: 'light',
            showActions: true,
        },
        
        // 安全配置
        security: {
            masking: {
                enabled: true,
                maskedSelectors: [
                    'input[type="password"]',
                    '.sensitive-data',
                    '#finance-*',
                ],
                regexPatterns: [
                    { pattern: /\b1[3-9]\d{9}\b/g, replacement: '[手机号]' },
                    { pattern: /¥\d+/g, replacement: '[金额]' },
                ],
            },
            confirmation: {
                requireConfirmationFor: ['submit', 'delete', 'remove'],
                defaultRequireConfirm: true,
            }
        },
        
        // 动作配置
        action: {
            delayBetweenActions: 300,
            actionTimeout: 30000,
            requireConfirmation: true,
            maxRetries: 3,
            showStepIndicator: true,
        }
    })
}

// 在应用中使用
const agent = createPageAgent(process.env.VITE_LLM_API_KEY)

// 导出供其他地方使用
export { agent }

四、模型选择：找到最适合你的「大脑」

Page-Agent 的设计理念是「手脚与大脑分离」。你可以根据不同的场景和需求，选择最合适的大模型。

4.1 阿里云通义千问系列

通义千问是阿里云推出的大模型，与 Page-Agent 的集成最为顺畅，推荐作为首选。

Qwen3.5-Plus（推荐）

const agent = new PageAgent({
    model: 'qwen3.5-plus',
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
    apiKey: 'YOUR_DASHSCOPE_API_KEY',
})

这是官方推荐的模型，在响应速度和工具调用准确性之间取得了很好的平衡。适合大多数场景，特别是中文环境下的表单填写和页面操作。

Qwen-Max

const agent = new PageAgent({
    model: 'qwen-max',
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
    apiKey: 'YOUR_DASHSCOPE_API_KEY',
})

如果你需要处理更复杂的页面逻辑，可以选择更强大的 Qwen-Max。它在复杂任务理解和推理方面表现更好。

**获取 API Key：访问 **https://dashscope.console.aliyun.com/

4.2 DeepSeek 系列

DeepSeek 以其高性价比著称，适合大规模生产部署。

const agent = new PageAgent({
    model: 'deepseek-chat',
    baseURL: 'https://api.deepseek.com/v1',
    apiKey: 'YOUR_DEEPSEEK_API_KEY',
})

DeepSeek-V3 是一个不错的选择，价格便宜的同时保持了良好的性能。对于预算有限的团队来说，这是一个极具吸引力的选择。

获取 API Key：访问 https://platform.deepseek.com/

4.3 OpenAI 和 Claude

如果你需要处理更复杂的英文场景，可以使用 OpenAI 或 Claude：

// OpenAI
const agent = new PageAgent({
    model: 'gpt-4o-mini',
    baseURL: 'https://api.openai.com/v1',
    apiKey: 'YOUR_OPENAI_API_KEY',
})

// Claude
const agent = new PageAgent({
    model: 'claude-3-haiku',
    baseURL: 'https://api.anthropic.com/v1',
    apiKey: 'YOUR_CLAUDE_API_KEY',
})

4.4 本地部署：Ollama

对于数据安全要求极高的企业场景，可以使用 Ollama 在本地部署开源模型：

const agent = new PageAgent({
    model: 'llama3',
    baseURL: 'http://localhost:11434/v1',
    apiKey: 'ollama', // Ollama 本地不需要真实的 API Key
})

这种方式下，所有数据都不会离开本地网络，特别适合处理敏感财务数据、医疗记录等。

4.5 模型选择建议

场景	推荐模型	理由
中文表单填写	Qwen3.5-Plus	中文理解好，与 Page-Agent 集成最佳
复杂页面逻辑	Qwen-Max / GPT-4o	推理能力强
成本敏感的批量部署	DeepSeek-V3	性价比高
敏感数据处理	Ollama (Llama3)	数据不出本地
快速原型验证	Demo 版本	无需配置，即插即用

五、实战案例：五大典型应用场景

本章将通过具体的代码示例，展示 Page-Agent 在不同场景下的使用方法。

5.1 场景一：智能表单填写

这是 Page-Agent 最常见的应用场景。假设你有一个客户信息收集表单，用户只需要说一句话，AI 就能帮他们填好：

<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="UTF-8">
    <title>智能客户信息录入</title>
    <style>
        .container { max-width: 600px; margin: 50px auto; padding: 20px; }
        .form-group { margin-bottom: 20px; }
        label { display: block; margin-bottom: 5px; font-weight: bold; }
        input, select, textarea { 
            width: 100%; padding: 10px; 
            border: 1px solid #ddd; border-radius: 4px;
        }
        button { 
            background: #28a745; color: white; 
            padding: 12px 24px; border: none; 
            border-radius: 4px; cursor: pointer; margin-right: 10px;
        }
        .ai-btn { background: #17a2b8; }
        .voice-btn { background: #ffc107; color: #000; }
    </style>
</head>
<body>
    <div class="container">
        <h1>客户信息录入</h1>
        
        <form id="customer-form">
            <div class="form-group">
                <label>姓名 *</label>
                <input type="text" id="name" required>
            </div>
            
            <div class="form-group">
                <label>公司</label>
                <input type="text" id="company">
            </div>
            
            <div class="form-group">
                <label>职位</label>
                <input type="text" id="title">
            </div>
            
            <div class="form-group">
                <label>手机号 *</label>
                <input type="tel" id="phone" required>
            </div>
            
            <div class="form-group">
                <label>邮箱</label>
                <input type="email" id="email">
            </div>
            
            <div class="form-group">
                <label>客户等级</label>
                <select id="level">
                    <option value="">请选择</option>
                    <option value="A">A级 - 高意向</option>
                    <option value="B">B级 - 中意向</option>
                    <option value="C">C级 - 低意向</option>
                </select>
            </div>
            
            <div class="form-group">
                <label>备注</label>
                <textarea id="remark" rows="3"></textarea>
            </div>
            
            <button type="submit">提交</button>
            <button type="button" class="ai-btn" id="ai-fill">AI 智能填写</button>
            <button type="button" class="voice-btn" id="voice-input">🎤 语音输入</button>
        </form>
    </div>

    <script type="module">
        import { PageAgent } from 'page-agent'
        
        const agent = new PageAgent({
            model: 'qwen3.5-plus',
            baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
            apiKey: 'YOUR_API_KEY',
            language: 'zh-CN',
            ui: { visible: false }
        })
        
        // AI 智能填写按钮
        document.getElementById('ai-fill').addEventListener('click', async () => {
            const info = prompt('请输入客户信息（可以用自然语言描述）：\n\n例如：张三，阿里巴巴技术总监，电话 13800138000，邮箱 zhangsan@alibaba.com，等级A级')
            
            if (info) {
                await agent.execute(`请帮我填写以下客户信息：${info}`)
            }
        })
        
        // 语音输入（使用浏览器原生 Web Speech API）
        document.getElementById('voice-input').addEventListener('click', () => {
            const recognition = new webkitSpeechRecognition()
            recognition.lang = 'zh-CN'
            recognition.onresult = async (event) => {
                const text = event.results[0][0].transcript
                alert(`识别到：${text}`)
                await agent.execute(`请帮我填写以下客户信息：${text}`)
            }
            recognition.start()
        })
    </script>
</body>
</html>

用户可以点击「AI 智能填写」按钮，然后输入类似这样的内容：「张三，阿里巴巴技术总监，电话 13800138000，邮箱 zhangsan@alibaba.com，客户等级选 A 级」，Page-Agent 会自动解析这些信息并填入对应的表单字段。

5.2 场景二：CRM 系统智能助手

在企业级 CRM 系统中，Page-Agent 可以大幅提升操作效率。以下是一个简化版的客户管理页面：

import { PageAgent } from 'page-agent'

class CRMAgent {
    constructor(apiKey) {
        this.agent = new PageAgent({
            model: 'qwen3.5-plus',
            baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
            apiKey: apiKey,
            language: 'zh-CN',
            systemPrompt: `你是一个 CRM 系统助手，擅长以下操作：
1. 查找客户：根据姓名、公司名、电话等条件搜索
2. 创建客户：填写新客户信息
3. 编辑客户：修改现有客户资料
4. 导出数据：将表格数据导出为 CSV
5. 批量操作：批量修改客户状态、标签等

请根据用户的自然语言指令执行相应操作。`
        })
    }
    
    // 查找客户
    async searchCustomer(keyword) {
        await this.agent.execute(`请在当前页面搜索包含"${keyword}"的客户`)
    }
    
    // 创建新客户
    async createCustomer(customerInfo) {
        await this.agent.execute(`请帮我创建新客户，信息如下：${customerInfo}`)
    }
    
    // 批量打标签
    async batchTag(condition, tag) {
        await this.agent.execute(`请找到所有${condition}的客户，给他们添加"${tag}"标签`)
    }
    
    // 导出客户列表
    async exportCustomers(condition = '全部') {
        await this.agent.execute(`请将${condition}的客户列表导出为 CSV 文件`)
    }
}

// 使用示例
const crm = new CRMAgent('YOUR_API_KEY')

// 在控制台中可以这样使用：
// crm.searchCustomer('阿里巴巴')
// crm.createCustomer('张三，公司：阿里云，职位：架构师')
// crm.batchTag('状态为潜在客户', '重点跟进')
// crm.exportCustomers('本月新增')

5.3 场景三：电商后台订单处理

电商运营人员每天需要处理大量订单，Page-Agent 可以显著提升效率：

import { PageAgent } from 'page-agent'

class OrderProcessor {
    constructor(apiKey) {
        this.agent = new PageAgent({
            model: 'qwen3.5-plus',
            baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
            apiKey: apiKey,
            language: 'zh-CN',
            security: {
                masking: {
                    enabled: true,
                    regexPatterns: [
                        { pattern: /\b1[3-9]\d{9}\b/g, replacement: '[手机号]' },
                    ]
                }
            }
        })
    }
    
    // 订单备注
    async addNote(orderId, note) {
        await this.agent.execute(
            `请找到订单号为 ${orderId} 的订单，在备注中添加：${note}`
        )
    }
    
    // 批量发货
    async batchShip(orderIds) {
        const orderList = orderIds.join('、')
        await this.agent.execute(
            `请对以下订单批量标记为已发货：${orderList}`
        )
    }
    
    // 退换货处理
    async handleRefund(orderId, reason) {
        await this.agent.execute(
            `请处理订单 ${orderId} 的退货申请，原因为：${reason}。完成后在备注中记录处理结果`
        )
    }
    
    // 订单查询
    async searchOrders(condition) {
        await this.agent.execute(`请根据以下条件筛选订单：${condition}`)
    }
}

5.4 场景四：无障碍访问支持

Page-Agent 另一个重要的应用场景是无障碍访问支持。通过集成语音识别和自然语言处理，可以帮助视障用户更好地使用网页：

import { PageAgent } from 'page-agent'

class AccessibilityHelper {
    constructor(apiKey) {
        this.agent = new PageAgent({
            model: 'qwen3.5-plus',
            baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
            apiKey: apiKey,
            language: 'zh-CN',
            ui: {
                visible: true,
                theme: 'dark',
                position: 'top-right'
            },
            action: {
                requireConfirmation: true,
                showStepIndicator: true
            }
        })
        
        this.initVoiceControl()
    }
    
    // 初始化语音控制
    initVoiceControl() {
        if ('webkitSpeechRecognition' in window) {
            this.recognition = new webkitSpeechRecognition()
            this.recognition.lang = 'zh-CN'
            this.recognition.continuous = true
            this.recognition.interimResults = false
            
            this.recognition.onresult = async (event) => {
                const command = event.results[event.results.length - 1][0].transcript
                console.log('收到语音指令：', command)
                await this.handleCommand(command)
            }
            
            this.recognition.onerror = (event) => {
                console.error('语音识别错误：', event.error)
            }
        }
    }
    
    // 启动语音监听
    startListening() {
        if (this.recognition) {
            this.recognition.start()
            console.log('语音监听已启动')
        }
    }
    
    // 停止语音监听
    stopListening() {
        if (this.recognition) {
            this.recognition.stop()
            console.log('语音监听已停止')
        }
    }
    
    // 处理语音指令
    async handleCommand(command) {
        const lowerCommand = command.toLowerCase()
        
        if (lowerCommand.includes('点击') || lowerCommand.includes('按')) {
            await this.agent.execute(command)
        } else if (lowerCommand.includes('读') || lowerCommand.includes('说')) {
            this.readPageContent()
        } else if (lowerCommand.includes('上') || lowerCommand.includes('下')) {
            await this.agent.execute(command)
        } else {
            // 默认执行操作
            await this.agent.execute(command)
        }
    }
    
    // 读取页面内容（配合屏幕阅读器）
    readPageContent() {
        const heading = document.querySelector('h1')?.textContent || ''
        const mainContent = document.querySelector('main')?.textContent?.slice(0, 200) || ''
        
        alert(`页面标题：${heading}。页面主要内容：${mainContent}...`)
    }
}

// 使用示例
const a11y = new AccessibilityHelper('YOUR_API_KEY')

// 添加一个语音激活按钮
const voiceBtn = document.createElement('button')
voiceBtn.textContent = '🎤 语音控制'
voiceBtn.style.cssText = 'position:fixed;top:20px;right:20px;z-index:9999;padding:10px;'
voiceBtn.onclick = () => {
    if (voiceBtn.textContent.includes('启动')) {
        a11y.startListening()
        voiceBtn.textContent = '🛑 停止语音'
    } else {
        a11y.stopListening()
        voiceBtn.textContent = '🎤 语音控制'
    }
}
document.body.appendChild(voiceBtn)

5.5 场景五：多页面数据搬运

结合 Chrome 扩展，Page-Agent 可以实现跨标签页的数据搬运：

// 这是一个 Chrome 扩展的 content script 示例
import { PageAgent } from 'page-agent'

class CrossTabAgent {
    constructor(apiKey) {
        this.agent = new PageAgent({
            model: 'qwen3.5-plus',
            baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
            apiKey: apiKey,
            language: 'zh-CN'
        })
        
        this.dataCache = {}
    }
    
    // 从当前页面提取数据
    async extractData(fields) {
        const result = await this.agent.execute(
            `请提取当前页面中以下字段的数据：${fields}。提取完成后将结果缓存。`
        )
        
        // 将提取的数据存储到 chrome.storage
        const cacheKey = `data_${Date.now()}`
        this.dataCache[cacheKey] = result
        chrome.storage.local.set({ [cacheKey]: result })
        
        return result
    }
    
    // 跳转到目标页面并填入数据
    async navigateAndFill(targetUrl, fieldMapping) {
        // 先跳转到目标页面
        window.location.href = targetUrl
        
        // 等待页面加载
        await new Promise(resolve => setTimeout(resolve, 2000))
        
        // 构建填充指令
        let fillCommand = '请填写以下数据：'
        for (const [sourceField, targetField] of Object.entries(fieldMapping)) {
            const cachedData = this.dataCache[sourceField]
            if (cachedData) {
                fillCommand += `${targetField}为${cachedData[sourceField]}；`
            }
        }
        
        await this.agent.execute(fillCommand)
    }
    
    // 跨标签页工作流
    async executeWorkflow(workflow) {
        for (const step of workflow.steps) {
            console.log(`执行步骤：${step.name}`)
            
            if (step.action === 'extract') {
                await this.extractData(step.fields)
            } else if (step.action === 'navigate') {
                await this.navigateAndFill(step.url, step.mapping)
            } else if (step.action === 'fill') {
                await this.agent.execute(step.command)
            }
            
            // 步骤之间稍作延迟
            await new Promise(resolve => setTimeout(resolve, 1000))
        }
    }
}

// 使用示例：从天眼查复制数据到 CRM
const workflow = {
    steps: [
        {
            name: '从天眼查提取公司信息',
            action: 'extract',
            fields: '公司名称、统一社会信用代码、法定代表人、注册资本'
        },
        {
            name: '跳转到 CRM',
            action: 'navigate',
            url: 'https://crm.yourcompany.com/customer/new',
            mapping: {
                '公司名称': 'companyName',
                '法定代表人': 'legalPerson',
                '注册资本': 'registeredCapital'
            }
        },
        {
            name: '补充填写',
            action: 'fill',
            command: '客户来源设置为"天眼查"，客户等级设置为"A级"'
        }
    ]
}

// const crossTabAgent = new CrossTabAgent('YOUR_API_KEY')
// await crossTabAgent.executeWorkflow(workflow)

六、高级技巧与最佳实践

本章将分享一些在生产环境中使用 Page-Agent 的高级技巧。

6.1 API Key 安全最佳实践

在前端代码中直接暴露 API Key 是一个严重的安全问题。以下是几种安全的实践方式：

方式一：通过环境变量

// .env 文件
VITE_LLM_API_KEY=sk-xxxxxx

// 代码中引用
const agent = new PageAgent({
    apiKey: import.meta.env.VITE_LLM_API_KEY,
})

**注意：使用 Vite 等构建工具时，环境变量需要以 **VITE_ 开头才能在客户端代码中使用。

方式二：通过后端代理（推荐）

// 前端代码
const agent = new PageAgent({
    model: 'qwen3.5-plus',
    baseURL: '/api/llm-proxy', // 指向你自己的后端接口
    apiKey: 'user-auth-token', // 使用用户认证 token
})

// 后端接口（Node.js/Express 示例）
app.post('/api/llm-proxy', async (req, res) => {
    const response = await fetch('https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions', {
        method: 'POST',
        headers: {
            'Authorization': `Bearer ${process.env.DASHSCOPE_API_KEY}`,
            'Content-Type': 'application/json',
        },
        body: JSON.stringify(req.body)
    })

    const data = await response.json()
    res.json(data)
})

方式三：使用临时 Token

对于大型应用，可以实现一个 Token 颁发机制：

// 后端生成临时 Token
function generateTempToken(userId) {
    const payload = {
        userId,
        exp: Date.now() + 3600000, // 1 小时过期
    }
    return jwt.sign(payload, process.env.JWT_SECRET)
}

// 前端使用临时 Token
const token = await fetch('/api/get-token').then(r => r.json())
const agent = new PageAgent({
    apiKey: token,
    // ...
})

6.2 自定义 UI 组件

Page-Agent 的默认 UI 可能不符合你的产品设计，这时候你可以隐藏默认 UI 并使用自己的组件：

import { PageAgent } from 'page-agent'

// 创建隐藏 UI 的 Agent
const agent = new PageAgent({
    model: 'qwen3.5-plus',
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
    apiKey: 'YOUR_API_KEY',
    ui: {
        visible: false // 隐藏默认 UI
    }
})

// 监听 Agent 事件
agent.on('ready', () => {
    console.log('Page-Agent 已就绪')
})

agent.on('action', (action) => {
    console.log('执行动作：', action)
})

agent.on('complete', (result) => {
    console.log('任务完成：', result)
})

agent.on('error', (error) => {
    console.error('出错了：', error)
})

// 创建你自己的 UI
function showMyCustomUI() {
    const modal = document.createElement('div')
    modal.className = 'ai-modal'
    modal.innerHTML = `
        <div class="ai-modal-content">
            <div class="ai-messages"></div>
            <div class="ai-input-area">
                <input type="text" placeholder="请描述你想要执行的操作..." />
                <button>发送</button>
            </div>
        </div>
    `
    document.body.appendChild(modal)
    
    // 绑定事件
    const input = modal.querySelector('input')
    const button = modal.querySelector('button')
    
    const handleSubmit = async () => {
        const command = input.value.trim()
        if (!command) return
        
        // 添加用户消息
        addMessage('user', command)
        input.value = ''
        
        // 执行命令
        try {
            addMessage('ai', '正在执行...')
            const result = await agent.execute(command)
            addMessage('ai', `已完成：${result}`)
        } catch (error) {
            addMessage('ai', `执行失败：${error.message}`)
        }
    }
    
    button.addEventListener('click', handleSubmit)
    input.addEventListener('keypress', (e) => {
        if (e.key === 'Enter') handleSubmit()
    })
}

function addMessage(role, content) {
    const messages = document.querySelector('.ai-messages')
    const msg = document.createElement('div')
    msg.className = `message ${role}`
    msg.textContent = content
    messages.appendChild(msg)
}

6.3 性能优化

在大规模使用时，需要注意一些性能优化点：

// 1. 复用 Agent 实例
const agent = new PageAgent({ /* 配置 */ })

// 避免重复创建，每次都复用同一个实例
function handleUserCommand(command) {
    return agent.execute(command)  // 复用
}

// 2. 减少 DOM 脱水开销
const agent = new PageAgent({
    // 配置只扫描特定区域
    scope: '#main-content', // 只分析主内容区域
    
    // 排除不需要分析的区域
    exclude: ['.sidebar', '.advertisement', 'iframe']
})

// 3. 批量操作优化
// 不好：连续发送多个单独命令
await agent.execute('点击第一个按钮')
await agent.execute('点击第二个按钮')
await agent.execute('点击第三个按钮')

// 好：合并为一个命令
await agent.execute('依次点击前三个按钮')

6.4 错误处理与重试

生产环境中，合理的错误处理至关重要：

import { PageAgent } from 'page-agent'

class RobustAgent {
    constructor(config) {
        this.agent = new PageAgent({
            ...config,
            action: {
                maxRetries: 3,
                actionTimeout: 30000,
            }
        })
    }
    
    async executeWithRetry(command, options = {}) {
        const { retries = 3, onRetry } = options
        let lastError
        
        for (let i = 0; i < retries; i++) {
            try {
                const result = await this.agent.execute(command)
                return { success: true, result }
            } catch (error) {
                lastError = error
                console.warn(`执行失败 (${i + 1}/${retries}):`, error.message)
                
                if (onRetry) {
                    const shouldRetry = await onRetry(error, i)
                    if (!shouldRetry) break
                }
                
                // 指数退避
                if (i < retries - 1) {
                    await new Promise(r => setTimeout(r, Math.pow(2, i) * 1000))
                }
            }
        }
        
        return { success: false, error: lastError }
    }
}

// 使用示例
const robustAgent = new RobustAgent({
    model: 'qwen3.5-plus',
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
    apiKey: 'YOUR_API_KEY',
})

const result = await robustAgent.executeWithRetry(
    '帮我填写表单',
    {
        retries: 3,
        onRetry: async (error, attempt) => {
            if (error.message.includes('rate limit')) {
                console.log('触发限流，等待后重试...')
                return true
            }
            return false // 其他错误不重试
        }
    }
)

if (result.success) {
    console.log('执行成功：', result.result)
} else {
    console.error('最终失败：', result.error)
}

6.5 调试技巧

开发过程中，可以使用以下技巧来调试：

const agent = new PageAgent({
    model: 'qwen3.5-plus',
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
    apiKey: 'YOUR_API_KEY',
    
    // 开启调试模式
    debug: {
        // 打印发送到 LLM 的提示词
        logPrompt: true,
        
        // 打印 DOM 脱水结果
        logDOM: true,
        
        // 打印执行的动作
        logActions: true,
        
        // 在页面上显示调试信息
        overlay: true
    }
})

// 手动触发一个动作来观察行为
async function debugCommand(command) {
    console.log('=== 开始调试 ===')
    console.log('输入命令：', command)
    
    // 获取脱水后的 DOM
    const dom = await agent.getDOM()
    console.log('脱水 DOM：', dom)
    
    // 执行并观察
    const result = await agent.execute(command)
    console.log('执行结果：', result)
    console.log('=== 调试结束 ===')
}

七、常见问题解答

本章汇总了使用 Page-Agent 过程中的常见问题。

7.1 接入问题

Q：引入 Page-Agent 后页面没有反应？

请检查以下几点：

确认 API Key 正确且有效
检查浏览器控制台是否有错误信息
确保没有 CSP（内容安全策略）阻止脚本执行
验证网络可以访问大模型 API

Q：页面加载后 Page-Agent UI 没有显示？

**确认配置中 **ui.visible 设置为 true：

const agent = new PageAgent({
    ui: {
        visible: true  // 确保显示
    }
})

7.2 功能问题

Q：AI 找不到页面上的元素？

检查元素是否有唯一的标识（id、name 或清晰的文本）
**确认元素是可见的（非 **display: none 或 visibility: hidden）
可以尝试在命令中更具体地描述元素位置，如「点击提交按钮，它在表单底部」

Q：表单填写错误或乱填？

这通常是大模型理解问题。可以通过更精确的系统提示词来解决：

const agent = new PageAgent({
    systemPrompt: `你是一个表单填写助手。请遵循以下规则：
1. 仔细阅读每个输入框的 placeholder 和 label
2. 手机号必须是 11 位数字
3. 邮箱必须包含 @ 符号
4. 填写完成后向用户确认`,
})

Q：页面操作太慢？

可以调整动作延迟配置：

const agent = new PageAgent({
    action: {
        delayBetweenActions: 100, // 减少延迟
    }
})

7.3 安全问题

Q：API Key 泄露了怎么办？

立即在对应平台更换 API Key
检查 API 调用日志，确认是否有异常调用
考虑迁移到后端代理模式

Q：敏感数据会被发送给大模型吗？

Page-Agent 默认会脱敏处理密码框内容。你也可以手动配置额外的脱敏规则。如果数据极度敏感，建议使用 Ollama 本地部署。

7.4 兼容性问题

Q：支持哪些浏览器？

Page-Agent 支持现代浏览器（Chrome、Firefox、Safari、Edge）的最新版本。需要支持 ES Modules。

Q：可以在 Vue/React 中使用吗？

完全可以。Page-Agent 是纯 JavaScript 实现，兼容任何前端框架。

// React 示例
import { PageAgent } from 'page-agent'
import { useEffect, useRef } from 'react'

function usePageAgent(apiKey) {
    const agentRef = useRef(null)
    
    useEffect(() => {
        agentRef.current = new PageAgent({
            model: 'qwen3.5-plus',
            baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
            apiKey,
        })
        
        return () => {
            // 清理
        }
    }, [apiKey])
    
    return agentRef.current
}