Introduction of AGENTREDBENCH benchmark 事件

BREAKTHROUGH影响: medium

Researchers introduced AGENTREDBENCH, a dynamic LLM-driven redteaming benchmark covering 215 subtle underspecified authorization scenarios across 24 enterprise integrations, and evaluated eight models from Anthropic, OpenAI, and Google.