As the application of deep learning continues to grow, so does the amount of data used to make predictions. While traditionally, big-data deep learning was constrained by computing performance and off-chip memory bandwidth, a new constraint has emerged: privacy. One solution is homomorphic encryption (HE). Applying HE to the client-cloud model allows cloud services to perform inference directly on the client’s encrypted data. While HE can meet privacy constraints, it introduces enormous computational challenges and remains impractically slow in current systems. This paper introduces Cheetah, a set of algorithmic and hardware optimizations for server-side HE DNN inference to approach real-time speeds. Cheetah proposes HE-parameter tuning optimization and operator scheduling optimizations, which together deliver 79× speedup over state-of-the-art. However, this still falls short of real-time inference speeds by almost four orders of magnitude. Cheetah further proposes an accelerator architecture, when combined with the algorithmic optimizations, to bridge the remaining performance gap. We evaluate several DNNs and show that privacy-preserving HE inference for ResNet50 can be done at near real-time performance with an accelerator dissipating 30W and 545mm2 in 5nm.