Machine Learning Academy
Field Guide to Machine Learning, Lesson 2: Data
The Facebook Field Guide to Machine Learning is a six-part video series developed by the Facebook ads machine learning team. The series shares best real-world practices and provides practical tips about how to apply machine-learning capabilities to real-world problems.
If you are interested in using machine learning to enhance your product in the real world, it’s important to understand how the entire development process works. It’s not only what happens during the training of your models, but everything that comes before and after, and how each step can either set you up for success or doom you to fail.
The Facebook ads machine learning team has developed a series of videos to help engineers and new researchers learn to apply their machine learning skills to real-world problems. The Facebook Field Guide to Machine Learning series breaks down the machine learning process into six steps:
1. Problem definition
In lesson one you learned how important it is to define the right problem to solve. Once you have your task clearly defined, you’re ready to dig into lesson two—preparing the data.
Lesson 2: Data. In this lesson, you will learn how preparing the training data is a core part of a machine learning engineer’s job. It’s an active not passive part of machine learning research and is one of the most powerful variables to create high-quality machine learning systems.
The lesson covers how to build robust data sets to solve real-world problems, and how your data preparation ultimately impacts the effectiveness of your models. The choices you make in creating training data are intricately tied to the kinds of models you will use and can directly impact the scalability and reliability of the end-to-end system.
The lesson covers three key areas of data preparation:
- Data recency and real-time training
- Train-Predict consistency
- Records and sampling
It’s important to be really, really sure that your data preparation is doing what you think it’s doing. The choices you make in creating training data will impact the success of your entire machine learning system.