September 4, 2014

#TagSpace: Semantic Embeddings from Hashtags

Empirical Methods in Natural Language Processing

By: Jason Weston, Sumit Chopra, Keith Adams


We describe a convolutional neural network that learns feature representations for short textual posts using hashtags as a supervised signal. The proposed approach is trained on up to 5.5 billion words predicting 100,000 possible hashtags. As well as strong performance on the hashtag prediction task itself, we show that its learned representation of text (ignoring the hashtag labels) is useful for other tasks as well.

To that end, we present results on a document recommendation task, where it also outperforms a number of baselines.