Efficient Memory Management for Large Language Model Serving with PagedAttention 论文

2023引用 980
Topic ModelingNatural Language Processing TechniquesCaching and Content Delivery

Efficient Memory Management for Large Language Model Serving with PagedAttention · 作者