# 自动调研模块 / Auto Research ## 1. 目标 自动化访问竞品站、数据源、采购源,深入页面采集内容并截图;完成站点标签/分类、竞品功能总结、采购员画像标签,并将结果写入知识库。 ## 2. 功能概览 1) Playwright 深度访问页面(非仅首页),支持多页抓取与全页截图 2) 站点标签/分类与采购员画像标签自动生成 3) 竞品站功能特色总结,写入 GraphRAG 输入 4) 自动化注册/登录(失败则记录待人工处理清单) 5) 输出报告与截图供人工复核 ## 3. 关键文件 - 脚本:`tools/auto_research/research.py` - 目标清单:`automation/research_targets.json` - 运行状态:`automation/research_status.json` - 待注册清单:`automation/research_pending_registrations.json` - 日志目录:`automation/research_logs/` - 运行产出:`automation/research_runs/` - GraphRAG 输入:`/root/ca_v3/cps/input/auto_research.txt` ## 4. 环境变量(见 `.env`) - `AUTO_RESEARCH_TARGETS_PATH` - `AUTO_RESEARCH_STATE_PATH` - `AUTO_RESEARCH_STATUS_PATH` - `AUTO_RESEARCH_LOG_DIR` - `AUTO_RESEARCH_RUN_DIR` - `AUTO_RESEARCH_OUTPUT_PATH` - `AUTO_RESEARCH_INDEX_CMD` - `AUTO_RESEARCH_MAX_PAGES` - `AUTO_RESEARCH_MAX_DEPTH` - `AUTO_RESEARCH_MAX_LINKS_PER_PAGE` - `AUTO_RESEARCH_SCREENSHOT` - `AUTO_RESEARCH_CAPTURE_HTML` - `AUTO_RESEARCH_SUMMARIZE` - `AUTO_RESEARCH_HEADLESS` - `AUTO_RESEARCH_SITE_TAGS`(可选,`|` 分隔) - `AUTO_RESEARCH_PERSONA_TAGS`(可选,`|` 分隔) ## 5. 目标站配置(`automation/research_targets.json`) 示例结构: ```json { "defaults": { "type": "competitor", "crawl": { "max_pages": 8, "max_depth": 2, "seed_paths": ["/", "/product", "/pricing"] } }, "sites": [ { "id": "example-competitor", "name": "Example Competitor", "type": "competitor", "base_url": "https://example.com", "tags": ["竞品"], "categories": ["招投标平台"], "login": { "enabled": false, "credentials": { "username": "", "password": "" }, "steps": [] }, "register": { "enabled": false, "steps": [] } } ] } ``` ### 5.1 自动注册/登录步骤 `login.steps` 与 `register.steps` 支持动作: - `goto`(进入页面) - `fill`(填写字段) - `click`(点击按钮) - `wait_for_selector`(等待元素) - `wait_for_url`(等待跳转) - `press`(键盘输入) - `sleep`(毫秒) 可用变量:`${USERNAME}` `${PASSWORD}` `${TIMESTAMP}` `${base_url}`。 ## 6. 输出与复核 每次运行都会在 `automation/research_runs/run-YYYYmmdd-HHMMSS/` 下生成: - 截图(PNG,fullPage) - 页面 HTML - `report.json`(页面列表、标签、总结) ## 7. 手动运行 > 先安装 Playwright:`python3 -m pip install playwright`,然后 `python3 -m playwright install chromium`。 ```bash python3 tools/auto_research/research.py python3 tools/auto_research/research.py --site example-competitor ``` > 如果缺少账号或注册失败,会进入 `automation/research_pending_registrations.json`,补齐账号后可继续自动调研。