BenGER Platform: A Collaborative Web Platform for End-to-End Benchmarking of German Legal Tasks 文章

ArXiv CS.CL2026-05-28NEWSen作者: Sebastian Nagl, Matthias Grabmair

摘要

arXiv:2604.13583v3 Announce Type: replace Abstract: Evaluating large language models (LLMs) for legal reasoning requires workflows that span task design, expert annotation, model execution, and metric-based evaluation. In practice, these steps are split across platforms and scripts, limiting transparency, reproducibility, and participation by non-technical legal experts. We present the BenGER (Benchmark for German Law) framework, an open-source web platform that integrates task creation, collaborative annotation, configurable LLM runs, and evaluation with lexical, semantic, factual, and judge-based metrics. BenGER supports multi-organization projects with tenant isolation and role-based access control, and can optionally provide formative, reference-grounded feedback to annotators. We will demonstrate a live deployment showing end-to-end benchmark creation and analysis.

BenGER Platform: A Collaborative Web Platform for End-to-End Benchmarking of German Legal Tasks 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品查看全部 (2)

相关技术