December 9, 2020

Sample-efficient exploration of trade-offs with parallel expected hypervolume improvement

By: Sam Daulton, Max Balandat, Eytan Bakshy

What the research is:

q-Expected Hypervolume Improvement (qEHVI) is a new sample-efficient method for optimizing multiple competing expensive-to-evaluate black-box functions. Traditional methods for multiobjective black-box optimization include evolutionary strategies that are robust and can efficiently generate a large batch of candidate designs to evaluate on the true functions in parallel, but they require many evaluations to converge to the set of Pareto optimal trade-offs.

However, in the case when the objectives are expensive to evaluate,sample efficiency is critical. In this case, Bayesian optimization is commonly used to evaluate designs. Typically, candidates are generated sequentially (e.g., using Expected Hypervolume Improvement). In addition, candidate generation usually involves numerically optimizing an acquisition function, which in existing approaches often does not provide gradients.

In this work, we propose a new acquisition function for multiobjective Bayesian optimization that 1) enables generating multiple candidates in parallel or asynchronously with proper uncertainty propagation over the pending candidate points, 2) generates candidates quickly using exact gradients, 3) yields state-of-the-art optimization performance, and 4) has desirable theoretical convergence guarantees.

qEHVI has several use cases across Facebook. For example, it is being used to tune parameters in Instagram’s recommendation systems, where it enables product teams to understand the optimal trade-offs between user engagement and CPU utilization, and has identified policies that yielded simultaneous improvement in both objectives. qEHVI has also been used to optimize the reward functions for the contextual bandit algorithms to determine video compression rates at upload time for Facebook and Instagram. This allows us to identify the set of optimal trade-offs between video upload quality and reliability, which has led to improved quality of service.

How it works:

In objective optimization, there typically is no single best solution; rather, the goal is to identify the set of Pareto optimal solutions such that improving any objective means deteriorating another.A natural measure of the quality of a Pareto frontier in the outcome space is the hypervolume that is dominated by the Pareto frontier and bounded from below by a reference point. Without loss of generality, we assume that the goal is to maximize all objectives. The utility of a new candidate is its hypervolume improvement, which is the volume that is exclusively dominated by the new point in the outcome space corresponding to the candidate (and not by the preexisting Pareto frontier). The hypervolume improvement is typically nonrectangular, but it can be computed efficiently by partitioning the nondominated space into disjoint hyperrectangles.

To generate candidates in parallel, we compute the joint hypervolume improvement across multiple new points by using the inclusion-exclusion principle to compute the volume of the union of the overlapping hyperrectangles. Since we do not know the objective values for a new candidate point a priori, we integrate over our uncertainty around the unobserved objective values provided by our probabilistic surrogate model (typically a Gaussian process), and use the expectedhypervolume improvement over the new candidate points as our acquisition function.

Why it matters:

Generating and evaluating designs in parallel is important for fast end-to-end optimization time. For example, when tuning the hyperparameters of machine learning models, one can often evaluate many hyperparameter settings in parallel by distributing evaluations across a cluster of machines. In addition, due to the high evaluation costs, generating high-quality candidates is critical. In many existing methods, the numerical optimization to find the maximizers of the acquisition function is very slow due to the lack of gradient information. Our acquisition function is differentiable, enabling gradient-based optimization and thus faster convergence and better candidates. Moreover, computation can be extremely parallelized: The acquisition function has constant time complexity given infinite cores and can be efficiently computed in many practical scenarios by exploiting GPU acceleration. We empirically show that our acquisition function achieves state-of-the-art optimization performance on a variety of benchmark problems.

In addition, we provide theoretical convergence guarantees on optimizing the acquisition function. Improving sample efficiency is important for speeding up current initiatives spanning from ranking systems, AutoML, materials design to robotics, and opening the door to new optimization problems that require expensive and/or time-consuming evaluations of black-box functions.

Read the full paper:

Differentiable expected hypervolume improvement for parallel multi-objective Bayesian optimization

Check out our open source implementations:

qEHVI is available as part ofAx, our open source library for adaptive experimentation. The underlying algorithm is implemented inBoTorch, and researchers in the area of Bayesian optimization can find implementation details there.