benchmark · 相关事件
相关事件
估值破2000亿,曝Kimi再融136亿,赴港IPO提速
2026-06-08IPO影响: HIGH
估值破2000亿,曝Kimi再融136亿,赴港IPO提速
2026-06-08PERSONNEL影响: LOW
Which Anatomy Matters Under Limited Labels? A Data-Efficient Anatomy-Aware Benchmark for Cardiac Pathology Prediction
2026-06-08PRODUCT_LAUNCH影响: MEDIUM
Synthetic Benchmarks Overstate Forward-Forward Scaling: Real-Data Limits of Layer-Local Training
2026-06-08PRODUCT_LAUNCH影响: MEDIUM
Synthetic Benchmarks Overstate Forward-Forward Scaling: Real-Data Limits of Layer-Local Training
2026-06-08BREAKTHROUGH影响: HIGH
The Masked Advantage: Uncovering Local-Language Access to Cultural Knowledge in LLMs
2026-06-08PRODUCT_LAUNCH影响: MEDIUM
PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams
2026-06-08PRODUCT_LAUNCH影响: MEDIUM
CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval
2026-06-08PRODUCT_LAUNCH影响: MEDIUM
SentinelBench: A Benchmark for Long-Running Monitoring Agents
2026-06-06PRODUCT_LAUNCH影响: MEDIUM
SentinelBench: A Benchmark for Long-Running Monitoring Agents
2026-06-06OPEN_SOURCE影响: MEDIUM
Multi-ResNets for Subspace Preconditioning in Constrained Optimization
2026-06-06PRODUCT_LAUNCH影响: MEDIUM
Search-Time Contamination in Deep Research Agents: Measuring Performance Inflation in Public Benchmark Evaluation
2026-06-06PRODUCT_LAUNCH影响: MEDIUM
SagnacAssisted Enhanced OTDR for Distributed Acoustic Sensing: A Standardized Benchmark and Engineering Evaluation Framework
2026-06-06PRODUCT_LAUNCH影响: MEDIUM
Causal Scaffolding for Physical Reasoning: A Benchmark for Causally-Informed Physical World Understanding in VLMs
2026-06-06PRODUCT_LAUNCH影响: MEDIUM
Causal Scaffolding for Physical Reasoning: A Benchmark for Causally-Informed Physical World Understanding in VLMs
2026-06-06BREAKTHROUGH影响: HIGH
Query-efficient model evaluation using cached responses
2026-06-06PRODUCT_LAUNCH影响: MEDIUM
When Attention Beats Fourier: Multi-Scale Transformers for PDE Solving on Irregular Domains
2026-06-06PRODUCT_LAUNCH影响: MEDIUM
Toto 2.0: Time Series Forecasting Enters the Scaling Era
2026-06-06PRODUCT_LAUNCH影响: MEDIUM
Toto 2.0: Time Series Forecasting Enters the Scaling Era
2026-06-06BREAKTHROUGH影响: HIGH
ComplexityMT: Benchmarking the Interaction Between Text Complexity and Machine Translation
2026-06-05PRODUCT_LAUNCH影响: MEDIUM
Operation-Guided Progressive Human-to-AI Text Transformation Benchmark for Multi-Granularity AI-Text Detection
2026-06-05PRODUCT_LAUNCH影响: MEDIUM
Statistically Reliable LLM-Based Ranking Evaluation via Prediction-Powered Inference
2026-06-05PRODUCT_LAUNCH影响: MEDIUM
Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments
2026-06-05PRODUCT_LAUNCH影响: MEDIUM
Aligning Tree-Search Policies with Fixed Token Budgets in Test-Time Scaling of LLMs
2026-06-05PRODUCT_LAUNCH影响: MEDIUM
CLFEC: A New Task for Unified Linguistic and Factual Error Correction in paragraph-level Chinese Professional Writing
2026-06-05PRODUCT_LAUNCH影响: MEDIUM
Retrieval-Augmented Generation Must Move Beyond Factual Grounding to Represent Diverse Opinions
2026-06-05PRODUCT_LAUNCH影响: MEDIUM
LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
2026-06-05PRODUCT_LAUNCH影响: MEDIUM
TextWand: A Unified Framework for Scene Text Editing
2026-06-05PRODUCT_LAUNCH影响: MEDIUM
Next-Generation Parallel Decoder for LPDR: Architectural Optimization and Class-Balanced GAN-Augmentation
2026-06-05PRODUCT_LAUNCH影响: MEDIUM
DisasterBench: A Multimodal Benchmark for UAV-Based Disaster Response in Complex Environments
2026-06-05PRODUCT_LAUNCH影响: MEDIUM
Unifying Dataset Pruning and Distillation for Efficient Large-scale Compression
2026-06-05PRODUCT_LAUNCH影响: MEDIUM
HighTide: An Agent-Curated Open-Source VLSI Benchmark Suite
2026-06-04OPEN_SOURCE影响: MEDIUM
Can Generalist Agents Automate Data Curation?
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Characterizing initial human-AI proof formalization workflows
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Not All Errors Are Equal: Consequence-Aware Reasoning Compute Allocation
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
HighTide: An Agent-Curated Open-Source VLSI Benchmark Suite
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
HighTide: An Agent-Curated Open-Source VLSI Benchmark Suite
2026-06-04SHUTDOWN影响: LOW
MemoryDocDataSet: A Benchmark for Joint Conversational Memory and Long Document Reasoning
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
'Your AI Text is not Mine': Redefining and Evaluating AI-generated Text Detection under Realistic Assumptions
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Plan, Watch, Recover: A Benchmark and Architectures for Proactive Procedural Assistance
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Reasoning or Fluency? Dissecting Probabilistic Confidence in Best-of-N Selection
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Emotion Entanglement and Bayesian Inference for Multi-Dimensional Emotion Understanding
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Caliper: Probing Lexical Anchors versus Causal Structure in LLMs
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Deliberation
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
T$^\star$: Progressive Block Scaling for Masked Diffusion Language Models Through Trajectory Aware Reinforcement Learning
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Reflection Separation from a Single Image via Joint Latent Diffusion
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Reflection Separation from a Single Image via Joint Latent Diffusion
2026-06-04BREAKTHROUGH影响: HIGH
Impostor: An Agent-Curated Benchmark for Realistic AIGC Manipulation Localization
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Scene-Centric Unsupervised Video Panoptic Segmentation
2026-06-04PRODUCT_LAUNCH影响: MEDIUM