Real-time LLM Inference on Standard GPUs: 3k tokens/s per request 文章

news.ycombinator.com2026-05-29NEWSen作者: NicoConstant