Bandana: Using Non-Volatile Memory for Storing Deep Learning Models

Conference on Systems and Machine Learning (SysML)


Typical large-scale recommender systems use deep learning models that are stored on a large amount of DRAM. These models often rely on embeddings, which consume most of the required memory. We present Bandana, a storage system that reduces the DRAM footprint of embeddings, by using Non-volatile Memory (NVM) as the primary storage medium, with a small amount of DRAM as cache. The main challenge in storing embeddings on NVM is its limited read bandwidth compared to DRAM. Bandana uses two primary techniques to address this limitation: first, it stores embedding vectors that are likely to be read together in the same physical location, using hypergraph partitioning, and second, it decides the number of embedding vectors to cache in DRAM by simulating dozens of small caches. These techniques allow Bandana to increase the effective read bandwidth of NVM by 2-3× and thereby significantly reduce the total cost of ownership.

Related Publications

All Publications

NAACL - June 6, 2021

Deep Learning on Graphs for Natural Language Processing

Lingfei Wu, Yu Chen, Heng Ji, Yunyao Li

ICASSP - June 6, 2021

On the Predictability of HRTFs from Ear Shapes Using Deep Networks

Yaxuan Zhou, Hao Jiang, Vamsi Krishna Ithapu

CoRL - December 1, 2020

Auxiliary Tasks Speed Up Learning PointGoal Navigation

Joel Ye, Dhruv Batra, Erik Wijmans, Abhishek Das

ACL - July 7, 2020

CraftAssist Instruction Parsing: Semantic Parsing for a Voxel-World Assistant

Kavya Srinet, Yacine Jernite, Jonathan Gray, Arthur Szlam

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy