Evaluating LLMs' Effectiveness on Real-World Consumer Device Repair Questions 文章

ArXiv CS.CL2026-06-03NEWSen作者: Atm Mizanur Rahman (University of Illinois Urbana-Champaign), Md Arid Hasan (University of Toronto), Syed Ishtiaque Ahmed (University of Toronto), Sharifa Sultana (University of Illinois Urbana-Champaign)

摘要

arXiv:2606.03331v1 Announce Type: new Abstract: Consumer device repair is an important but underexplored testbed for large language models (LLMs). Repair tasks require reasoning over incomplete problem descriptions, hardware-specific diagnostics, actionable troubleshooting, and safety-critical decisions, where incorrect advice can cause device damage, battery hazards, or permanent data loss. We introduce a benchmark of 991 real-world repair questions from Reddit spanning phone repair, computer repair, and data recovery, each paired with technician-written reference solutions, and provide Bangla translations to evaluate cross-lingual performance. We evaluate six state-of-the-art LLMs in English and Bangla using four repair-specific criteria: correctness, completeness, practicality, and safety. Our results show that while LLMs can provide useful repair assistance, they remain unreliable for high-risk real-world repair tasks without rigorous evaluation and explicit safety safeguards.

相关事件

暂无数据

相关人物

暂无数据

相关产品

暂无数据