THE content discusses the release of Adaptive Spec-driven Scoring for Evaluation and Regression Testing (ASSERT), an open-source framework by Microsoft's Responsible AI team. ASSERT aims to convert natural language behavioral requirements of AI systems into executable evaluations, enabling teams to systematically test AI models against specific behavioral expectations.
The framework consists of four stages: converting broad specifications into detailed behavior taxonomies, generating stratified test cases, executing tests on the AI model, and scoring outcomes against predefined behaviors. Internal validation studies demonstrated that ASSERT offers more extensive and meaningful evaluations compared to standard methods. Notably, the framework is designed to help developers create tailored evaluations quickly while providing insights into specific failures. The tool is available on GitHub for public use.