Introduction of AGENTREDBENCH benchmark 事件
BREAKTHROUGH影响: medium
Researchers introduced AGENTREDBENCH, a dynamic LLM-driven redteaming benchmark covering 215 subtle underspecified authorization scenarios across 24 enterprise integrations, and evaluated eight models from Anthropic, OpenAI, and Google.