agent-lab · evaluation platform
GAUNTLET
Run every model and harness through the gauntlet — keep what survives, per task, per repo.
For any task in Omni or Tastymaestro, know the model + effort + harness rules + guardrails that produce the best result — and prove it with evidence.
Phases
0/7
complete
Tasks
0/27
done
Open questions
0 of 6
Updated
2026-06-24