Computational linguistics experts from around the world are gathering at the 56th meeting of the Association for Computational Linguistics (ACL) in Melbourne, Australia to present and learn about the latest advances in computational linguistics. Research from Facebook will be presented in oral spotlight presentations and poster sessions. Facebook researchers and engineers will also be organizing and participating in workshops.
Facebook research being presented at ACL 2018
A Multi-lingual Multi-task Architecture for Low-resource Sequence Labeling
Ying Lin, Shengqi Yang, Veselin Stoyanov and Heng Li
We show that by pre-training a system for sequence tagging it can learn to perform a new task with less training data. Sequence labeling is a popular class of NLP problems such as Part-of-Speech tagging or Named Entity Tagging. With pre-training our system with data for multiple languages and multiple other tasks we achieve substantial improvements when training data is limited. For example for the Name Tagging task, we achieve improvement in F-score of 4.8%-39.3% absolute compared to traditional approaches.
We learn a joint multilingual sentence embedding and use the distance between sentences in different languages to filter noisy parallel data and to mine for parallel data in large news collections. We are able to improve a competitive baseline on the WMT’14 English to German task by 0.3 BLEU by filtering out 25% of the training data. The same approach is used to mine additional bitexts for the WMT’14 system and to obtain competitive results on the BUCC shared task to identify parallel sentences in comparable corpora. The approach is generic, it can be applied to many language pairs and it is independent of the architecture of the machine translation system.
We perform a direct comparison of two lines of research into hypernymy (is-a) relationship detection: those based on distributional vectors and those based on lexico-syntactic features (Hearst patterns). We find that a simple embedding of off-the-shelf Hearst patterns significantly outperforms several unsupervised distributional approaches. We find that the current best unsupervised distributional approaches do not yet exploit these known strong feature sets.
In this work, we explore the problem of multi-sentence text generation through the lens of story writing. We scraped a dataset of stories and writing prompts from an online forum and tackle two main questions: writing multiple sentences that are coherent together and writing a story that follows the provided prompt. By allowing the model to refer back to the prompt and improving the mechanism for remembering what sentences it has already written, we can significantly improve the quality of written stories.
Personalizing Dialogue Agents: I have a dog, do you have pets too?
Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela and Jason Weston
We show that giving chatbots a unique persona — a profile of interests and characteristics — whilst trying to learn about their dialogue partner’s interests makes for more engaging and consistent dialogues. We have released a new dataset, Persona-Chat and models that work on that task. We are currently running a competition to find even better models here and look forward to your entries!
“Downstream” tasks, often based on sentence classification, are commonly used to evaluate the quality of sentence representations. The complexity of the tasks makes it however difficult to infer what kind of information is present in the representations. In this work, we introduce 10 probing tasks designed to capture simple linguistic features of sentences, and we use them to study embeddings generated by three different encoders trained in eight distinct ways, uncovering intriguing properties of both encoders and training methods.
Other activities at ACL 2018
In this work, we focus on the problem of document summarization – reading a newspaper article and distilling it to the important points. In contrast to standard summarization methods, our work focuses on the generation of personalized summaries. For example, some readers may want to focus on a specific person on a sports team, want to read a shorter summary because they are busy, or want a summary of the remaining portion of the article they did not have time to read. We describe a simple method for achieving controllable summarization and show that our model can achieve state of the art results on a large-scale news summarization dataset.
3rd Workshop on Computational Approaches to Linguistic Code-Switching (CALCS)
Paper: Code-Switched Named Entity Recognition with Embedding Attention
Changhan Wang, Kyunghyun Cho and Douwe Kiela
We describe our work for the CALCS 2018 shared task on named entity recognition on code-switched data. Code-switching, i.e. the tendency for multilingual speakers to switch between languages, poses important problems for NLP systems. We designed and implemented a neural system that performs attention over word and character-level embeddings from different languages and feeds them into a shortcut-stacked BiLSTM-CRF, which ranked first place for MS Arabic-Egyptian named entity recognition and third place for English-Spanish.
3rd Workshop on Representation Learning for NLP (RepL4NLP)
Paper: Multilingual Seq2seq Training with Similarity Loss for Cross-Lingual Document Classification
Katherin Yu, Haoran Li and Barlas Oguz
This paper continues work on training sentence encoders using neural machine translation to do zero-shot cross-lingual transfer. We introduce an explicit penalty to bring sentence embeddings in different languages closer together. We present results on cross-lingual similarity search and cross-lingual document classification.