bid.hao.work/docs
Document

22-auto-research.md

未找到提交记录 · 文件更新时间:2026-01-21 13:35:05 +08:00

自动调研模块 / Auto Research

1. 目标

自动化访问竞品站、数据源、采购源,深入页面采集内容并截图;完成站点标签/分类、竞品功能总结、采购员画像标签,并将结果写入知识库。

2. 功能概览

  1. Playwright 深度访问页面(非仅首页),支持多页抓取与全页截图
  2. 站点标签/分类与采购员画像标签自动生成
  3. 竞品站功能特色总结,写入 GraphRAG 输入
  4. 自动化注册/登录(失败则记录待人工处理清单)
  5. 输出报告与截图供人工复核

3. 关键文件

4. 环境变量(见 .env

5. 目标站配置(automation/research_targets.json

示例结构:

{
  "defaults": {
    "type": "competitor",
    "crawl": {
      "max_pages": 8,
      "max_depth": 2,
      "seed_paths": ["/", "/product", "/pricing"]
    }
  },
  "sites": [
    {
      "id": "example-competitor",
      "name": "Example Competitor",
      "type": "competitor",
      "base_url": "https://example.com",
      "tags": ["竞品"],
      "categories": ["招投标平台"],
      "login": {
        "enabled": false,
        "credentials": { "username": "", "password": "" },
        "steps": []
      },
      "register": {
        "enabled": false,
        "steps": []
      }
    }
  ]
}

5.1 自动注册/登录步骤

login.stepsregister.steps 支持动作:

可用变量:${USERNAME} ${PASSWORD} ${TIMESTAMP} ${base_url}

6. 输出与复核

每次运行都会在 automation/research_runs/run-YYYYmmdd-HHMMSS/ 下生成:

7. 手动运行

先安装 Playwright:python3 -m pip install playwright,然后 python3 -m playwright install chromium

python3 tools/auto_research/research.py
python3 tools/auto_research/research.py --site example-competitor

如果缺少账号或注册失败,会进入 automation/research_pending_registrations.json,补齐账号后可继续自动调研。