evaluate 产品

来源: githubOPEN_SOURCE开源PythonApache-2.0发布于 2022-03-30

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.

2448

Stars

320

Forks

技术栈

替代方案

evaluate · 相关事件

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Evaluating Stochastic Collapse and Implicit Bias in Multimodal Large Language Models

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

PEFT of SLM for Telecommunications Customer Support: A Comparative Study of LoRA Configurations with Energy Consumption Analysis

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

The Granularity Gap: A Multi-Dimensional Longitudinal Audit of Sycophancy in Gemini Models

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

The Granularity Gap: A Multi-Dimensional Longitudinal Audit of Sycophancy in Gemini Models

2026-06-05REGULATION影响: MEDIUM

LoRi: Low-Rank Distillation for Implicit Reasoning

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

A Model of Multi-turn Human Persuadability Using Probabilistic Belief Tracing

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

ComplexityMT: Benchmarking the Interaction Between Text Complexity and Machine Translation

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

MASF: A Multi-Model Adaptive Selection Framework for Abstractive Text summarization

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Using Large Language Models to Support High Volume Application Review for an Undergraduate Research Program

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Representing Research Attention as Contextually Structured Flows

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

CHALIS: A Challenge Dataset for Language Identification in Difficult Scenarios

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

FiLM-Based Speaker Conditioning of a SpeechLLM for Pathological Speech Recognition

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

From Self to Other: Evaluating Demographic Perspective-Taking in LLM Hate Speech Annotation

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Emergent Language as an Approach to Conscious AI

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

"Chi nas dal soch el sent de legn" -- Auditing Text Corpora for Lombard

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online Discussions

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Agents' Last Exam

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

EpiEvolve: Self-Evolving Agents for Streaming Pandemic Forecasting under Regime Shifts

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Almieyar-Oryx-BloomBench: A Bilingual Multimodal Benchmark for Cognitively Informed Evaluation of Vision-Language Models

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Less is MoE: Trimming Experts in Domain-Specialist Language Models

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

LongSpace: Exploring Long-Horizon Spatial Memory from Perception to Recall in Video

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

An Embarrassingly Simple Detector for Model Extraction Attacks in Large Language Model API Traffic

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

2026-06-05ACQUISITION影响: HIGH

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

2026-06-05SHUTDOWN影响: LOW

RedditPersona: A Modular Framework for Community-Conditioned LLM Adaptation from Reddit

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

The Mirage of Performance Gains: Why Contrastive Decoding Fails to Mitigate Object Hallucinations in MLLMs?

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Seeing is Believing? Evaluating Vision-Language Model Susceptibility in Agent-to-Agent Multimodal Persuasion

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Leveraging Large Language Models for Generating Research Topic Ontologies: A Multi-Disciplinary Study

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

AgentDisCo: Towards Disentanglement and Collaboration in Open-ended Deep Research Agents

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Deep Learning-assisted AMD Staging based on OCT and OCT Angiography

2026-06-05ACQUISITION影响: HIGH

Deep Learning-assisted AMD Staging based on OCT and OCT Angiography

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Horse Eye Blink Detection and Classification for Equine Affective State Assessment

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

ORACLE-CT: Anatomy-Aware Support Pooling for CT Classification

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Can We Predict The Human Preference For Text-to-Image Content Prior To Generation And Is It Even Useful To Do So?

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

V2V-Bench: A Comprehensive Benchmark for Video-to-Video Generation Evaluation

2026-06-05OPEN_SOURCE影响: MEDIUM

V2V-Bench: A Comprehensive Benchmark for Video-to-Video Generation Evaluation

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Next-Generation Parallel Decoder for LPDR: Architectural Optimization and Class-Balanced GAN-Augmentation

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Faithful, Enriched, and Precise: Benchmarking Natural-Science Illustration Generation by T2I models

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

MS-DKC: A Dataset Knowledge Card Framework for Designing and Adapting Medical Image Segmentation Models

2026-06-05ACQUISITION影响: HIGH

MS-DKC: A Dataset Knowledge Card Framework for Designing and Adapting Medical Image Segmentation Models

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

RQUL-UIE: Revitalizing Quality-Unstable Labels for Underwater Image Enhancement via In-Dataset Self-Supervision

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

SC-MFJ: A Simple Haptic Quality Metric for Medical Image Segmentation

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Symb-xMIL: Symbolic Explanations for Multiple Instance Learning in Digital Pathology

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

Comparison of Deep Learning Frameworks For Rice Disease Mapping From UAV Multispectral Imaging

2026-06-05PRODUCT_LAUNCH影响: MEDIUM