September 4, 2014

#TagSpace: Semantic Embeddings from Hashtags

Empirical Methods in Natural Language Processing

We describe a convolutional neural network that learns feature representations for short textual posts using hashtags as a supervised signal. The proposed approach is trained on up to 5.5 billion words predicting 100,000 possible hashtags.

Jason Weston, Sumit Chopra, Keith Adams
September 1, 2014

Optimal Crowd-Powered Rating and Filtering Algorithms

VLDB 2014

We focus on crowd-powered filtering, i.e., filtering a large set of items using humans. Filtering is one of the most commonly used building blocks in crowdsourcing applications and systems. While solu…

Aditya Parameswaran, Stephen Boyd, Hector Garcia-Molina, Ashish Gupta, Neoklis Polyzotis, Jennifer Widom
August 24, 2014

Streamed Approximate Counting of Distinct Elements

ACM Conference on Knowledge Discovery and Data Mining (KDD)

Counting the number of distinct elements in a large dataset is a common task in web applications and databases. This problem is difficult in limited memory settings where storing a large hash table ta…

Daniel Ting
August 24, 2014

Practical Lessons from Predicting Clicks on Ads at Facebook

International Workshop on Data Mining for Online Advertising (ADKDD)

Online advertising allows advertisers to only bid and pay for measurable user responses, such as clicks on ads. As a consequence, click prediction systems are central to most online advertising system…

He Xinran, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, Joaquin Quinonero Candela
August 7, 2014

Perceiving, Learning, and Exploiting Object Affordances for Autonomous Pile Manipulation

Autonomous Robots

Autonomous manipulation in unstructured environments will enable a large variety of exciting and important applications. Despite its promise, autonomous manipulation remains largely unsolved. Even the…

Anthony Stentz, Arun Venkatraman, Dubi Katz, J. Andrew Bagnell, Moslem Kazemi
June 24, 2014

Collaborative Hashing

Conference on Computer Vision and Pattern Recognition (CVPR)

Hashing technique has become a promising approach for fast similarity search. Most of existing hashing research pursue the binary codes for the same type of entities by preserving their similarities….

Xianglong Liu, Junfeng He, Cheng Deng, Bo Lang
June 24, 2014

DeepFace: Closing the Gap to Human-Level Performance in Face Verification

Conference on Computer Vision and Pattern Recognition (CVPR)

In modern face recognition, the conventional pipeline consists of four stages: detect => align => represent => classify. We revisit both the alignment step and the representation step by employing exp…

Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf
June 24, 2014

PANDA: Pose Aligned Networks for Deep Attribute Modeling

Conference on Computer Vision and Pattern Recognition (CVPR)

We propose a method for inferring human attributes (such as gender, hair style, clothes style, expression, action) from images of people under large variation of viewpoint, pose, appearance, articulat…

Ning Zhang, Manohar Paluri, Marc'Aurelio Ranzato, Trevor Darrell, Lubomir Bourdev
April 11, 2014

Designing and Deploying Online Field Experiments

International World Wide Web Conference (WWW)

Online experiments are widely used to compare specific design alternatives, but they can also be used to produce generalizable knowledge and inform strategic decision making. Doing so often requires sophisticated experimental designs, iterative refinement, and careful logging and analysis.

Eytan Bakshy, Dean Eckles, Michael Bernstein
April 7, 2014

Personalized Collaborative Clustering

International World Wide Web Conference (WWW)

We study the problem of learning personalized user models from rich user interactions. In particular, we focus on learning from clustering feedback (i.e., grouping recommended items into clusters), wh…

Yisong Yue, Chong Wang, Khalid El-Arini, Carlos Guestrin
February 18, 2014

Romantic Partnerships and the Dispersion of Social Ties: A Network Analysis of Relationship Status on Facebook

ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW)

A crucial task in the analysis of on-line social-networking systems is to identify important people — those linked by strong social ties — within an individual’s network neighborhood. Here we investig…

Lars Backstrom, Jon Kleinberg
December 16, 2013

Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising

Journal of Machine Learning Research (JMLR)

This work shows how to leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system. Such p…

Denis Charles, Dipankar Ray, Ed Snelson, Elon Portugaly, Joaquin Quinonero Candela, Leon Bottou, Max Chickering, Patrice Simard
December 8, 2013

Using Web Text to Improve Keyword Spotting in Speech

Automatic Speech Recognition and Understanding Workshop (ASRU)

For low resource languages, collecting sufficient training data to build acoustic and language models is time consuming and often expensive. In this paper, we investigate the use of online text resour…

Ankur Gandhe, Long Qin, Florian Metze, Alexander Rudnicky, Ian Lane, Matthias Eck
August 22, 2013

Reciprocal Hash Tables for Nearest Neighbor Search

AAAI Conference on Artificial Intelligence (AI)

Recent years have witnessed the success of hashing techniques in approximate nearest neighbor search. In practice, multiple hash tables are usually employed to retrieve more desired results from all hit buckets of each table. However, there are rare works studying the unified approach to constructing multiple informative hash tables except the widely used random way.

Xianglong Liu, Junfeng He, Bo Lang
August 22, 2013

Weighted Hashing for Fast Large Scale Similarity Search

ACM International Conference on Information and Knowledge Management (CIKM)

Similarity search, or finding approximate nearest neighbors, is an important technique for many applications. Many recent research demonstrate that hashing methods can achieve promising results for large scale similarity search due to its computational and memory efficiency.

Qifan Wang, Dan Zhang, Luo Si
August 11, 2013

MI2LS: Multi-Instance Learning from Multiple Information Sources

ACM Conference on Knowledge Discovery and Data Mining (KDD)

In Multiple Instance Learning (MIL), each entity is normally expressed as a set of instances. Most of the current MIL methods only deal with the case when each instance is represented by one type of f…

Dan Zhang, Jingrui He, Richard Lawrence
August 9, 2013

Uncertainty in Online Experiments with Dependent Data: An Evaluation of Bootstrap Methods

ACM Conference on Knowledge Discovery and Data Mining (KDD)

Many online experiments exhibit dependence between users and items. For example, in online advertising, observations that have a user or an ad in common are likely to be associated. Because of this, even in experiments involving millions of subjects, the difference in mean outcomes between control and treatment conditions can have substantial variance. Previous theoretical and simulation results demonstrate that not accounting for this kind of dependence structure can result in confidence intervals that are too narrow, leading to inaccurate hypothesis tests.

Eytan Bakshy, Dean Eckles
June 17, 2013

MILEAGE: Multiple Instance LEArning with Global Embedding

International Conference on Machine Learning (ICML)

Multiple Instance Learning (MIL) generally represents each example as a collection of instances such that the features for local objects can be better captured, whereas traditional methods typically extract a global feature vector for each example as an integral part. However, there is limited research work on investigating which of the two learning scenarios performs better.

Dan Zhang, Jingrui He, Luo Si, Richard Lawrence
June 7, 2013

Representing Documents Through Their Readers

ACM Conference on Knowledge Discovery and Data Mining (KDD)

From Twitter to Facebook to Reddit, users have become accustomed to sharing the articles they read with friends or followers on their social networks. While previous work has modeled what these shared stories say about the user who shares them, the converse question remains unexplored: what can we learn about an article from the identities of its likely readers?

Khalid El-Arini, Min Xu, Emily Fox, Carlos Guestrin
May 20, 2013

Machine Learning Paradigms for Speech Recognition: An Overview

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Automatic Speech Recognition (ASR) has historically been a driving force behind many machine learning (ML) techniques, including the ubiquitously used hidden Markov model, discriminative learning, Bay…

Li Deng, Xiao Li