Performance optimizations of large scale services can lead to significant wins on service efficiency and performance. CPU resource is one of the most common performance bottlenecks, hence improving CPU performance has been the focus of many performance optimization efforts. In particular, reducing iTLB (instruction TLB) miss rates can greatly improve CPU performance and speed up service running.
At Facebook, we have achieved CPU reduction by applying a solution that firstly identifies hot-text of the (software) binary and then places the binary on huge pages (i.e., 2MB+ memory pages). The solution is wrapped into an automated framework, enabling service owners to effortlessly adopt it. Our framework has been applied to many services at Facebook, and this paper shares our experiences and findings.