Evaluating using Mock Tool Calls to Quarantine Untrusted Prompt Inputs 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

Evaluating using Mock Tool Calls to Quarantine Untrusted Prompt Inputs arXiv:2605.30521v1 Announce Type: new Abstract: Large language models must frequently process untrusted inputs, such as judging an answer from another model or running tasks like spam and harm classifiers while under adversarial pressure. These inputs are often string-formatted directly into a prompt template, leaving systems fragile to manipulation. Current LLM specs from major providers like OpenAI distinguish trustworthin