Country Feasibility · vB
76%
-0.05
78%
73%
样本数: 38 · 2026-03-02
ops eval
把每个 agent 当作可量化的决策模块来验证,而不是黑盒。
Agent 评分卡
Country Feasibility · vB
76%
-0.05
78%
73%
样本数: 38 · 2026-03-02
Site Scout · vB
74%
-0.07
75%
72%
样本数: 51 · 2026-03-02
StartUp Workflow · vB
79%
-0.03
80%
76%
样本数: 36 · 2026-03-02
Recruitment Dynamics · vB
78%
-0.06
77%
75%
样本数: 48 · 2026-03-02
Risk Officer · vB
81%
-0.02
82%
79%
样本数: 33 · 2026-03-02
Country Feasibility · vB
73%
-0.06
76%
71%
样本数: 27 · 2026-03-02
Site Scout · vB
71%
-0.09
73%
69%
样本数: 39 · 2026-03-02
StartUp Workflow · vB
76%
-0.04
79%
75%
样本数: 29 · 2026-03-02
Recruitment Dynamics · vB
75%
-0.07
76%
73%
样本数: 42 · 2026-03-02
Risk Officer · vB
79%
-0.03
81%
77%
样本数: 26 · 2026-03-02
Country Feasibility · vB
72%
-0.08
74%
69%
样本数: 23 · 2026-03-02
Site Scout · vB
69%
-0.11
70%
66%
样本数: 34 · 2026-03-02
StartUp Workflow · vB
74%
-0.05
77%
72%
样本数: 25 · 2026-03-02
Recruitment Dynamics · vB
71%
-0.09
73%
68%
样本数: 36 · 2026-03-02
Risk Officer · vB
77%
-0.04
79%
74%
样本数: 22 · 2026-03-02
版本对比(vA vs vB)
| Agent | vA | vB | 差值 |
|---|---|---|---|
| Country Feasibility | acc 68% · bias -0.12 · stab 71% · adopt 64% | acc 76% · bias -0.05 · stab 78% · adopt 73% | acc 8.0% bias 0.07 stab 7.0% adopt 9.0% |
| Site Scout | acc 61% · bias -0.18 · stab 66% · adopt 59% | acc 74% · bias -0.07 · stab 75% · adopt 72% | acc 13.0% bias 0.11 stab 9.0% adopt 13.0% |
| StartUp Workflow | acc 70% · bias -0.09 · stab 74% · adopt 67% | acc 79% · bias -0.03 · stab 80% · adopt 76% | acc 9.0% bias 0.06 stab 6.0% adopt 9.0% |
| Recruitment Dynamics | acc 64% · bias -0.16 · stab 68% · adopt 60% | acc 78% · bias -0.06 · stab 77% · adopt 75% | acc 14.0% bias 0.10 stab 9.0% adopt 15.0% |
| Risk Officer | acc 72% · bias -0.08 · stab 73% · adopt 66% | acc 81% · bias -0.02 · stab 82% · adopt 79% | acc 9.0% bias 0.06 stab 9.0% adopt 13.0% |
人工反馈闭环
Site Scout · vB
建议: Replace 2 low-conversion sites in France
依据: vB predicts 14% slower conversion in current bottom quartile sites.
Recruitment Dynamics · vB
建议: Increase pre-screening support in Germany
依据: Screen-fail burden contributes most of month-4 gap.
StartUp Workflow · vB
建议: Parallelize ethics and contract package
依据: Critical-path lag is concentrated in startup packet handoff.
Risk Officer · vB
建议: Escalate monitoring cadence for 3 sites
依据: Risk scoring flags persistent quality variance.