E2LLM: Towards Efficient LLM Serving in Heterogeneous Edge/Fog Environments 文章

ArXiv CS.AI2026-06-03NEWSen作者: Truong-Thanh Le, Amir Taherkordi, Hoang-Loc La, Frank Eliassen, Phuong Hoai Ha, Peiyuan Guan

查看原文 →

关系图谱

摘要

arXiv:2606.03770v1 Announce Type: cross Abstract: Large Language Models (LLMs) have become integral to modern applications, yet their deployment remains challenging. Beyond executing the models themselves, practical deployment must address cost efficiency, low latency, and optimal resource utilization. Conventional approaches typically assume that an entire model can be hosted on a single device, which does not hold in many real-world scenarios, particularly in Edge and Fog environments where device resources are constrained. In this paper, we introduce E2LLM, a framework designed to enable efficient LLM deployment in such resource limited settings. Rather than simply partitioning a single model across all available devices, E2LLM replicates the full model across multiple groups of devices (replicas) and applies model parallelism within each replica. Each replica is assigned a specialized role PREFILL or DECODER based on its efficiency in handling input and output tokens.

E2LLM: Towards Efficient LLM Serving in Heterogeneous Edge/Fog Environments 文章

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (4)

相关技术查看全部 (3)