QUIET: A Multi-Blank Cascaded Story Cloze Benchmark for LLM Creative Generation Capability 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

QUIET: A Multi-Blank Cascaded Story Cloze Benchmark for LLM Creative Generation Capability arXiv:2605.25955v1 Announce Type: new Abstract: Large language models (LLMs) face a dual challenge in creative capability evaluation: existing benchmarks (e.g., Story Cloze Test, HellaSwag) measure models' discriminative ability over narrative continuation using multiple-choice recognition paradigms, rather than directly measuring creative generation capability; rubric-based scoring and LLM-as-Judge metho

QUIET: A Multi-Blank Cascaded Story Cloze Benchmark for LLM Creative Generation Capability · 相关报道