August 26, 2020

Leveraging online social interactions for enhancing integrity at Facebook

By: Nima Noorshams, Saurabh Verma, Aude Hofleitner

Nima Noorshams, Saurabh Verma, and Aude Hofleitner are Research Scientists at Facebook working within Core Data Science, a research and development team focused on improving Facebook’s processes, infrastructure, and products.

What we did: Sequence modeling for integrity

Billions of people rely on Facebook products and services to connect with family and friends, build new communities, share experiences, and run their businesses. However, the rise of inauthentic accounts and activities as well as disparaging and threatening content on social media has introduced several integrity challenges. Needless to say, maintaining the integrity of such a large and growing network in a fast and scalable manner is of utmost importance for the safety and security of the online community.

Entities on the platform, such as accounts, posts, pages, and groups, are not static. They interact with one another over time, which can reveal a lot about their nature. For instance, fake accounts and misinformation posts elicit different types of reactions from other accounts than do normal/benign accounts and posts (see Figure 1). In the paper “TIES: Temporal Interaction Embeddings for enhancing social media integrity at Facebook,” we focus on the problem of leveraging these interactions in order to enhance the integrity of the platform.

In short, TIES is a deep learning, application-agnostic, scalable framework for embedding sequences of entity interactions. It encodes not only the sequence of actions but also various features of sources and targets of the interactions. The embedding vectors can then be used for various integrity applications, such as detecting fake accounts, identifying misinformation or hate speech, detecting high-risk ad accounts, and many others.

Figure 1, at left: Account-account interaction used to detect fake accounts. At right: Post-account interactions used to identify misinformation.

How we did it: Combining graph representations and sequence learning

Past studies have mainly focused on either static or dynamic behaviors of the networks, but not both at the same time. In contrast, the core of TIES consist of two embeddings:

  1. Graph-based embedding, which captures the static (or slow-changing) information encoded in the large social graph.
  2. Sequence-based embedding, which captures the more dynamic actions.

Prior knowledge, such as friending and group or page memberships, are captured in the social graph. Large-scale embedding algorithms, such as PyTorch-BigGraph, can be used to encode graph information. These graph-based embeddings are then used to initialize the sequence encoder piece of the framework. Figure 2 illustrates the model architecture.

We first convert the sequence of triplets (source, target, action) into feature vectors. These vectors consist of trainable action embeddings, pretrained source and target embeddings (which are produced by PyTorch-BigGraph), as well as miscellaneous features such as time-gap between the actions. The features are then fed into a sequence encoder, which consists of a seq2seq encoding layer, self-attention, and pooling layer. The model parameters are trained by minimizing a loss function over a labeled data set, thus creating supervised embeddings.

TIES applications at Facebook

We have tested this framework on several applications, including detecting misinformation, detecting fake accounts and engagements, and identifying high-risk ad accounts. Different types of actions and features were used for each application. For instance, in detecting misinformation, we used sequences of user actions on posts, such as likes, comments, shares, and so on.

In all the aforementioned applications, we used a portion of the training samples (up to millions of samples) for training the TIES model and then passed that as an additional feature to baseline models (generally complicated models consisting of several hundred carefully engineered features and/or deep learning frameworks that were already deployed into production). In all instances, we observed uniform and statistically significant gains over existing baselines that can contribute to enhancing the integrity of our platform, and TIES features have been deployed into production since.