This year, Facebook’s contribution to the NeurIPS Expo are workshops featuring PyTorch: Responsible and Reproducible AI with PyTorch and Facebook and Multi-modal Research to Production with PyTorch and Facebook. The workshops took place on Sunday, December 8 and featured presentations from Facebook AI researchers, which are all available to download below.

Read our blog for more information about Facebook at NeurIPS, including papers, workshops, tutorials, and other activities. Visit our NeurIPS 2019 event page for booth information, Facebook attendees at NeurIPS, related blog posts, and more.

Session 1: Responsible and Reproducible AI

Abstract: Responsible and Reproducible AI are amongst the hottest of topics in the field today. This half-day workshop dove into some of the important areas that are shaping the future of how we interpret, reproduce research and build AI with privacy in mind. The team covered the major challenges, walk through some solutions, and finish each talk with a hands-on tutorial.

1. Reproducibility

Reproducibility is a crucial requirement for many fields of research including those based on machine learning techniques. As the number of research papers submitted to arXiv and conferences skyrockets into the tens of thousands, scaling reproducibility becomes difficult. In machine learning research, authors release their publications with model weights and code to be helpful. However, to scale reproducibility we must also address the following three challenges: 1) aid extensibility by standardizing code bases, 2) democratize paper implementation by writing hardware agnostic code, and 3) facilitate results validation by documenting “tricks” authors use to make their complex systems function. Without addressing these issues, researchers can spend weeks re-implementing a single research paper, and reviewers have a hard time validating empirical paper results.

To offer solutions toward reproducibility, we dove into tools PyTorch Hub and PyTorch Lightning which are used by some of the top researchers in the world to reproduce the state of the art.

Click to download slide presentations:

2. Interpretability

With the increase in model complexity and the resulting lack of transparency, model interpretability methods have become increasingly important. Model understanding is both an active area of research as well as an area of focus for practical applications across industries using machine learning. To get hands on with model interpretability, the team uses the recently released Captum library.

Captum provides state-of-the-art algorithms, including Integrated Gradients, Conductance, Smoothgrad/Vargrad, DeepLift and others to provide researchers and developers with an easy way to understand the importance of neurons/layers and the predictions made by the models.

Click to download slide presentation:

3. Private AI

Practical applications of machine learning via cloud-based or machine-learning-as-a-service (MLaaS) platforms pose a range of security and privacy challenges. In particular, users of these platforms may not be willing or allowed to share their data with platform providers, which prevents them from taking full advantage of the value of the platforms. To address these challenges, there are a number of technical approaches being studied at various levels of maturity including (1) homomorphic encryption, (2) secure multi-party computation, (3) trusted execution environments, (4) on-device computation, and (5) differential privacy. To provide a more immersive understanding of how some of these technologies are applied, the team uses the newly released CrypTen project which provides a community based research platform to take the field of Private AI forward.

Click to download slide presentation:

Session 2: Multi-modal Research to Production

Abstract: The content at Facebook and more broadly continues to increase in diversity and is made up of a number of modalities (text, audio, video, etc). For example, an ad may contain multiple components including image, body text, title, video, and landing pages. Even an individual component may bear multimodal traits; for instance, a video contains visual and audio signals, a landing page is composed of images, texts, HTML sources, and so on. This workshop dove into a number modalities such as computer vision (large-scale image classification and instance segmentation) and NLP and Speech (seq-to-seq Transformers) from the lens of taking cutting edge research to production. The team also walked through how to use the latest APIs in PyTorch to take eager mode developed models into graph mode via Torchscript and quantize them for scale production deployment on servers or mobile devices.

1. Optimized Research to Production

To allow researchers to fully explore the design space and, when ready, ship models into production, PyTorch provides seamless path from eager mode pythonic development to graph mode. For further optimization and efficiency, and because of the general insensitivity of deep neural nets to precision, models can then be quantized.

Torchscript. TorchScript is a way to create serializable and optimizable models from PyTorch code. Any TorchScript program can be saved from a Python process and loaded in a process where there is no Python dependency. PyTorch provides tools to incrementally transition a model from a pure Python program to a TorchScript program that can be run independently from Python, such as in a standalone C++ program. This makes it possible to train models in PyTorch using familiar tools in Python and then export the model via TorchScript to a production environment where Python programs may be disadvantageous for performance and multi-threading reasons.

Quantization. PyTorch supports quantization in both eager and graph modes allowing for full freedom and control. This allows users to use lower precision backends such as FBGEMM and QNNPACK to accelerate performance on servers and mobile devices respectively while also lowering memory bandwidth footprint. We will walk through how to use techniques such as dynamic quantization, post training quantization and quantization aware training to reduce computation costs and memory usage, improve performance all with minimal accuracy loss for architectures such as residual nets and transformers.

Click to download slide presentations:

2. Large Scale Production CV

Image and video classification are at the core of many of Facebook’s content understanding algorithms. Facebook has held the state-of-the-art result for image classification on ImageNet since May 2018. That result has been achieved in a weakly supervised setting, training models using billions of images with noisy labels. In this talk, the team introduces Classy Vision: A PyTorch framework developed by Facebook AI for research on large-scale image and video classification. Classy Vision allows researchers to quickly prototype and iterate on large distributed training jobs. Models built on Classy Vision can be seamlessly deployed to production, and Classy Vision powers the next generation of classification models in production at Facebook.

Click to download slide presentation:

3. State of the art detection and image segmentation

Object detection and segmentation are used across a number of tasks from autonomous driving to content understanding for platform integrity. This talk dove into Detectron2, the recently released object detection library built by the FAIR computer vision team. The team articulates the improvements over the previous version including: 1) support for latest models and new tasks; 2) increased flexibility, to enable new computer vision research; 3) maintainable and scalable, to support production use cases. They finish by getting hands on with the platform to illustrate how users can get started and build on top of Detectron2 for their own research.

Click to view slide presentation:

4. State of the art production Transformers for Translation and Audio

Language Translation and Audio processing are critical components to many systems and applications such as search, translation, speech, assistants, etc. We recently have seen tremendous progress in these fields thanks to the development of new architectures like the Transformer as well as large scale pretraining methods. This talk showcases Translation and Speech research work conducted at FAIR, our artificial intelligence research lab, and demonstrate how Fairseq, a general purpose sequence-to-sequence library, can be used in many applications, including (unsupervised) translation, summarization, dialog and speech recognition.

Click to download slide presentations: