StochStack

ops eval

Agent Evaluation Dashboard

Validate each agent as a measurable decision module, not a black box.

Agent Scorecards

Country Feasibility · vB

Accuracy

76%

Bias

-0.05

Stability

78%

Adoption

73%

Samples: 38 · 2026-03-02

Site Scout · vB

Accuracy

74%

Bias

-0.07

Stability

75%

Adoption

72%

Samples: 51 · 2026-03-02

StartUp Workflow · vB

Accuracy

79%

Bias

-0.03

Stability

80%

Adoption

76%

Samples: 36 · 2026-03-02

Recruitment Dynamics · vB

Accuracy

78%

Bias

-0.06

Stability

77%

Adoption

75%

Samples: 48 · 2026-03-02

Risk Officer · vB

Accuracy

81%

Bias

-0.02

Stability

82%

Adoption

79%

Samples: 33 · 2026-03-02

Country Feasibility · vB

Accuracy

73%

Bias

-0.06

Stability

76%

Adoption

71%

Samples: 27 · 2026-03-02

Site Scout · vB

Accuracy

71%

Bias

-0.09

Stability

73%

Adoption

69%

Samples: 39 · 2026-03-02

StartUp Workflow · vB

Accuracy

76%

Bias

-0.04

Stability

79%

Adoption

75%

Samples: 29 · 2026-03-02

Recruitment Dynamics · vB

Accuracy

75%

Bias

-0.07

Stability

76%

Adoption

73%

Samples: 42 · 2026-03-02

Risk Officer · vB

Accuracy

79%

Bias

-0.03

Stability

81%

Adoption

77%

Samples: 26 · 2026-03-02

Country Feasibility · vB

Accuracy

72%

Bias

-0.08

Stability

74%

Adoption

69%

Samples: 23 · 2026-03-02

Site Scout · vB

Accuracy

69%

Bias

-0.11

Stability

70%

Adoption

66%

Samples: 34 · 2026-03-02

StartUp Workflow · vB

Accuracy

74%

Bias

-0.05

Stability

77%

Adoption

72%

Samples: 25 · 2026-03-02

Recruitment Dynamics · vB

Accuracy

71%

Bias

-0.09

Stability

73%

Adoption

68%

Samples: 36 · 2026-03-02

Risk Officer · vB

Accuracy

77%

Bias

-0.04

Stability

79%

Adoption

74%

Samples: 22 · 2026-03-02

Version Compare (vA vs vB)

AgentvAvBDelta
Country Feasibilityacc 68% · bias -0.12 · stab 71% · adopt 64%acc 76% · bias -0.05 · stab 78% · adopt 73%
acc 8.0%
bias 0.07
stab 7.0%
adopt 9.0%
Site Scoutacc 61% · bias -0.18 · stab 66% · adopt 59%acc 74% · bias -0.07 · stab 75% · adopt 72%
acc 13.0%
bias 0.11
stab 9.0%
adopt 13.0%
StartUp Workflowacc 70% · bias -0.09 · stab 74% · adopt 67%acc 79% · bias -0.03 · stab 80% · adopt 76%
acc 9.0%
bias 0.06
stab 6.0%
adopt 9.0%
Recruitment Dynamicsacc 64% · bias -0.16 · stab 68% · adopt 60%acc 78% · bias -0.06 · stab 77% · adopt 75%
acc 14.0%
bias 0.10
stab 9.0%
adopt 15.0%
Risk Officeracc 72% · bias -0.08 · stab 73% · adopt 66%acc 81% · bias -0.02 · stab 82% · adopt 79%
acc 9.0%
bias 0.06
stab 9.0%
adopt 13.0%

Human Feedback Loop

Accepted: 0Rejected: 0Feedback records: 0

Site Scout · vB

Suggestion: Replace 2 low-conversion sites in France

Rationale: vB predicts 14% slower conversion in current bottom quartile sites.

Recruitment Dynamics · vB

Suggestion: Increase pre-screening support in Germany

Rationale: Screen-fail burden contributes most of month-4 gap.

StartUp Workflow · vB

Suggestion: Parallelize ethics and contract package

Rationale: Critical-path lag is concentrated in startup packet handoff.

Risk Officer · vB

Suggestion: Escalate monitoring cadence for 3 sites

Rationale: Risk scoring flags persistent quality variance.