benchmark 产品

来源: githubOPEN_SOURCE开源PythonBSD-3-Clause发布于 2017-05-26

TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance.

1033

Stars

337

Forks

技术栈

替代方案

benchmark · 相关事件

相关事件

估值破2000亿，曝Kimi再融136亿，赴港IPO提速

2026-06-08IPO影响: HIGH

估值破2000亿，曝Kimi再融136亿，赴港IPO提速

2026-06-08PERSONNEL影响: LOW

Which Anatomy Matters Under Limited Labels? A Data-Efficient Anatomy-Aware Benchmark for Cardiac Pathology Prediction

2026-06-08PRODUCT_LAUNCH影响: MEDIUM

Synthetic Benchmarks Overstate Forward-Forward Scaling: Real-Data Limits of Layer-Local Training

2026-06-08PRODUCT_LAUNCH影响: MEDIUM

Synthetic Benchmarks Overstate Forward-Forward Scaling: Real-Data Limits of Layer-Local Training

2026-06-08BREAKTHROUGH影响: HIGH

The Masked Advantage: Uncovering Local-Language Access to Cultural Knowledge in LLMs

2026-06-08PRODUCT_LAUNCH影响: MEDIUM

PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams

2026-06-08PRODUCT_LAUNCH影响: MEDIUM

CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval

2026-06-08PRODUCT_LAUNCH影响: MEDIUM

SentinelBench: A Benchmark for Long-Running Monitoring Agents

2026-06-06PRODUCT_LAUNCH影响: MEDIUM

SentinelBench: A Benchmark for Long-Running Monitoring Agents

2026-06-06OPEN_SOURCE影响: MEDIUM

Multi-ResNets for Subspace Preconditioning in Constrained Optimization

2026-06-06PRODUCT_LAUNCH影响: MEDIUM

Search-Time Contamination in Deep Research Agents: Measuring Performance Inflation in Public Benchmark Evaluation

2026-06-06PRODUCT_LAUNCH影响: MEDIUM

SagnacAssisted Enhanced OTDR for Distributed Acoustic Sensing: A Standardized Benchmark and Engineering Evaluation Framework

2026-06-06PRODUCT_LAUNCH影响: MEDIUM

Causal Scaffolding for Physical Reasoning: A Benchmark for Causally-Informed Physical World Understanding in VLMs

2026-06-06PRODUCT_LAUNCH影响: MEDIUM

Causal Scaffolding for Physical Reasoning: A Benchmark for Causally-Informed Physical World Understanding in VLMs

2026-06-06BREAKTHROUGH影响: HIGH

Query-efficient model evaluation using cached responses

2026-06-06PRODUCT_LAUNCH影响: MEDIUM

When Attention Beats Fourier: Multi-Scale Transformers for PDE Solving on Irregular Domains

2026-06-06PRODUCT_LAUNCH影响: MEDIUM

Toto 2.0: Time Series Forecasting Enters the Scaling Era

2026-06-06PRODUCT_LAUNCH影响: MEDIUM

Toto 2.0: Time Series Forecasting Enters the Scaling Era

2026-06-06BREAKTHROUGH影响: HIGH

ComplexityMT: Benchmarking the Interaction Between Text Complexity and Machine Translation

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Operation-Guided Progressive Human-to-AI Text Transformation Benchmark for Multi-Granularity AI-Text Detection

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Statistically Reliable LLM-Based Ranking Evaluation via Prediction-Powered Inference

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Aligning Tree-Search Policies with Fixed Token Budgets in Test-Time Scaling of LLMs

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

CLFEC: A New Task for Unified Linguistic and Factual Error Correction in paragraph-level Chinese Professional Writing

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Retrieval-Augmented Generation Must Move Beyond Factual Grounding to Represent Diverse Opinions

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

TextWand: A Unified Framework for Scene Text Editing

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Next-Generation Parallel Decoder for LPDR: Architectural Optimization and Class-Balanced GAN-Augmentation

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

DisasterBench: A Multimodal Benchmark for UAV-Based Disaster Response in Complex Environments

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Unifying Dataset Pruning and Distillation for Efficient Large-scale Compression

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

HighTide: An Agent-Curated Open-Source VLSI Benchmark Suite

2026-06-04OPEN_SOURCE影响: MEDIUM

Can Generalist Agents Automate Data Curation?