Representing Documents Through Their Readers

ACM Conference on Knowledge Discovery and Data Mining (KDD)


From Twitter to Facebook to Reddit, users have become accustomed to sharing the articles they read with friends or followers on their social networks. While previous work has modeled what these shared stories say about the user who shares them, the converse question remains unexplored: what can we learn about an article from the identities of its likely readers?

To address this question, we model the content of news articles and blog posts by attributes of the people who are likely to share them. For example, many Twitter users describe themselves in a short profile, labeling themselves with phrases such as “vegetarian” or “liberal.” By assuming that a user’s labels correspond to topics in the articles he shares, we can learn a labeled dictionary from a training corpus of articles shared on Twitter. Thereafter, we can code any new document as a sparse non-negative linear combination of user labels, where we encourage correlated labels to appear together in the output via a structured sparsity penalty.

Finally, we show that our approach yields a novel document representation that can be effectively used in many problem settings, from recommendation to modeling news dynamics. For example, while the top politics stories will change drastically from one month to the next, the “politics” label will still be there to describe them. We evaluate our model on millions of tweeted news articles and blog posts collected between September 2010 and September 2012, demonstrating that our approach is effective.

Related Publications

All Publications

A Scalable Approach to Control Diverse Behaviors for Physically Simulated Characters

Jungdam Won, Deepak Gopinath, Jessica Hodgins

ACM SIGGRAPH - July 19, 2020

ARCH: Animatable Reconstruction of Clothed Humans

Zeng Huang, Yuanlu Xu, Christoph Lassner, Hao Li, Tony Tung

CVPR - June 15, 2020

FroDO: From Detections to 3D Objects

Martin Rünz, Kejie Li, Meng Tang, Lingni Ma, Chen Kong, Tanner Schmidt, Ian Reid, Lourdes Agapito, Julian Straub, Steven Lovegrove, Richard Newcombe

CVPR - June 13, 2020

Plan2vec: Unsupervised Representation Learning by Latent Plans

Ge Yang, Amy Zhang, Ari Morcos, Joelle Pineau, Pieter Abbeel, Roberto Calandra

Learning for Dynamics & Control (L4DC) - June 10, 2020

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy