RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing 论文

2020引用 220

Stochastic Gradient Optimization TechniquesCaching and Content DeliveryRecommender Systems and Techniques

Stochastic Gradient Optimization Techniques Recommender Systems and Techniques Caching and Content Delivery

作者

摘要

Personalized recommendation systems leverage deep learning models and account for the majority of data center AI cycles. Their performance is dominated by memory-bound sparse embedding operations with unique irregular memory access patterns that pose a fundamental challenge to accelerate. This paper proposes a lightweight, commodity DRAM compliant, near-memory processing solution to accelerate personalized recommendation inference. The in-depth characterization of production-grade recommendation models shows that embedding operations with high model-, operator and data-level parallelism lead to memory bandwidth saturation, limiting recommendation inference performance. We propose RecNMP which provides a scalable solution to improve system throughput, supporting a broad range of sparse embedding models. RecNMP is specifically tailored to production environments with heavy co-location of operators on a single server. Several hardware/software cooptimization techniques such as memory-side caching, tableaware packet scheduling, and hot entry profiling are studied, providing up to 9.8× memory latency speedup over a highly-optimized baseline. Overall, RecNMP offers 4.2× throughput improvement and 45.8% memory energy savings.

作者查看全部 (21)

Xuan Zhang

Mark Hempstead

Carole-Jean Wu

Brandon Reagen

RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing 论文

摘要

作者查看全部 (21)

相关技术查看全部 (1)

相关事件

相关文章