August 12, 2021

Facebook Fellow Spotlight: Striving for provable guarantees in the theoretical foundations of machine learning

By: Meta Research

Each year, PhD students from around the world apply for the Facebook Fellowship, a program designed to encourage and support doctoral students engaged in innovative and relevant research in areas related to computer science and engineering.

As a continuation of our Fellowship spotlight series, we’re highlighting 2020 Facebook Fellow in applied statistics Lydia Zakynthinou.

Lydia is a PhD candidate at the Khoury College of Computer Science at Northeastern University, where she is advised by Jonathan Ullman and Huy Lê Nguyễn. Her research focuses on the theoretical foundations of machine learning and data privacy.

During her studies at the National Technical University of Athens in Greece, Lydia developed an interest in the theoretical foundations of machine learning and algorithms. Algorithms in particular fascinated her, as they have a direct application in solving real-world problems, especially in a world that values big data.

“Algorithms are everywhere,” Lydia says. “But there is a challenge in determining the trade-offs between the resources they consume, such as computational speed, accuracy, privacy loss, and amount of data, so that we, as researchers, can make informed choices about the algorithms we use.” She points to a simple example of such a trade-off: “Sometimes training a whole deep neural network is really slow, but it is the best we have in terms of accuracy.” That is what encouraged Lydia to study the theoretical foundations of machine learning more deeply.

Lydia’s research seeks to answer two main questions:

  • How can one ensure that an algorithm generalizes well and doesn’t overfit the data set?
  • How can one ensure that the privacy of the individuals’ data is guaranteed?

The effectiveness of an algorithm hinges upon its ability to learn about the population it applies to. But algorithms are designed to learn and be accurate on the data set they are trained on, which leads to two undesirable phenomena: overfitting (that is, an algorithm, misleadingly, performing extremely well on the data set but not on the population) and privacy leakage. This is where generalization and differential privacy come in, respectively.

If an algorithm generalizes well, then its performance on the data set is guaranteed to be close to its performance on the population. Currently, there are many frameworks that seek to achieve this, but they are often incompatible with one another. Lydia’s work proposes a new framework that unifies current theories aiming to understand the properties that an algorithm needs to have to guarantee generalization.

Differential privacy deals with the second side effect, privacy leakage. It is a mathematically rigorous technique that essentially guarantees that no attacker, regardless of their additional knowledge, can infer much more about any individual than they could have had that individual’s data never been included in the data set. It has become the standard criterion for ensuring privacy in machine learning models and has been adopted in several real-world applications. “By design, differential privacy also ensures generalization,” Lydia stresses.

Lydia’s work analyzes core statistical problems and proposes a theoretical framework that unifies current theories, making it possible to create new algorithms that achieve differential privacy and generalize well to the population they apply to. “In general, we should strive toward provable guarantees,” Lydia says, and especially when it comes to data privacy. “Because machine learning is so applied, I feel the need to make sure [an algorithm] behaves as we think it does.”

To learn more about Lydia Zakynthinou and her research, visit her website.