描述
QA harness for LLM agents: scenario suites, flake controls, tool sandboxing, LLM-as-judge scoring, and regression protocols.
软件工程 / 诊断修复
qa-agent-testing
描述
QA harness for LLM agents: scenario suites, flake controls, tool sandboxing, LLM-as-judge scoring, and regression protocols.
安全审计