描述
Run Microsoft's eval-recipes benchmarks to validate amplihack improvements against baseline agents. Auto-activates when testing improvements, running evals, or benchmarking changes.
软件工程 / 诊断修复
eval-recipes-runner
描述
Run Microsoft's eval-recipes benchmarks to validate amplihack improvements against baseline agents. Auto-activates when testing improvements, running evals, or benchmarking changes.
安全审计