TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering

Knowledge Discovery in Databases (KDD)


Taxonomy construction is not only a fundamental task for semantic analysis of text corpora, but also an important step for applications such as information filtering, recommendation, and Web search. Existing pattern-based methods extract hypernym-hyponym term pairs and then organize these pairs into a taxonomy. However, by considering each term as an independent concept node, they over-look the topical proximity and the semantic correlations among terms. In this paper, we propose a method for constructing topic taxonomies, wherein every node represents a conceptual topic and is defined as a cluster of semantically coherent concept terms. Our method, TaxoGen, uses term embeddings and hierarchical cluster-ing to construct a topic taxonomy in a recursive fashion. To ensure the quality of the recursive process, it consists of: (1) an adaptive spherical clustering module for allocating terms to proper levels when splitting a coarse topic into fine-grained ones; (2) a local embedding module for learning term embeddings that maintain strong discriminative power at different levels of the taxonomy. Our experiments on two real datasets demonstrate the effectiveness of TaxoGen compared with baseline methods.

Related Publications

All Publications

NeurIPS - October 22, 2020

Re-Examining Linear Embeddings for High-dimensional Bayesian Optimization

Benjamin Letham, Roberto Calandra, Akshara Rai, Eytan Bakshy

Journal of Machine Learning Research (JMLR) - September 30, 2019

Bayesian Optimization for Policy Search via Online-Offline Experimentation

Benjamin Letham, Eytan Bakshy

International Workshop on Mutation Analysis at ICST - May 6, 2021

An Empirical Comparison of Mutant Selection Assessment Metrics

Jie M. Zhang, Lingming Zhang, Dan Hao, Lu Zhang, Mark Harman

AISTATS - April 13, 2021

Aligning Time Series on Incomparable Spaces

Samuel Cohen, Giulia Luise, Alexander Terenin, Brandon Amos, Marc Peter Deisenroth

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy