Machine Learning Academy

The Field Guide to Machine Learning, Lesson 4: Features

The Facebook Field Guide to Machine Learning is a six-part video series developed by the Facebook ads machine learning team. The series shares best real-world practices and provides practical tips about how to apply machine-learning capabilities to real-world problems.

If you’re interested in using machine learning to enhance your product in the real world, it’s important to understand how the entire development process works. It’s not only what happens during the training of your models, but everything that comes before and after, and how each step can either set you up for success or doom you to fail.

The Facebook ads machine learning team has developed a series of videos to help engineers and new researchers learn to apply their machine learning skills to real-world problems. The Facebook Field Guide to Machine Learning series breaks down the machine learning process into six steps:

1. Problem definition
2. Data
3. Evaluation
4. Features
5. Model
6. Experimentation

This video series covers each of these steps, explaining how the decisions you make along the way can help you successfully apply machine learning to your product or use case. Each lesson highlights examples and stories of non-obvious things that can be important in an applied setting.

Lesson 4: Features

We now have a training data set, a baseline model, and a fast offline method for evaluating model performance. In this fourth lesson of the Facebook Field Guide to Machine Learning, we focus on features.

Facebook AI research director, Leon Bottou shared his thoughts on features in his 2015 ICML talk.  “Building better features is the second most important way to impact machine learning performance (after data), and better features are more important than model improvements.” The whole process is part of an iterative cycle.

Although changing the dataset is a powerful tool for improving model performance, it also requires more engineering and time. Data can be thought of as a big, slow iteration cycle, while features and models have a smaller, faster iteration cycle performed at some constant data set.

In this lesson, we explain examples of categorical, continuous and derived features and how to choose the right feature for the right model. We also cover areas to look out for such as changing features, feature breakage, leakage and coverage.

All Videos