RNNLM and n-gram statistics
This project explores how explicitly stated n-gram distribution can be used as a set of soft constraints to direct the language model behavior during the generation without sacrificing its accuracy in assigning probability to sequences of words. We apply the technique to reduce word-level repetition (a common problematic behavior). It also improves model generalizability by incorporating statistical constraints are n-gram statistics taken from a large corpus.
[ Paper: AAAI’18 | Links: code]
This project aims to build generative models of definitions in dictionaries and encyclopedias. The model learns from pairs of word embedding and definition. The model is then tested on how well it generates definitions for a given set of word embeddings. We are exploring deep learning algorithms and other language models.
[ Paper: AAAI’17 or arXiv | Links: demo, code, preprocessing ]
This is a collaboration between WebSAIL and Allen Institute for Artificial Intelligence (AI2). The project focuses on building a system that extracts key phrases from scholar articles. These key phrases are facet values to filter search results. Chandra is maintaining the system at AI2. semanticscholar.org is using a version of the system.
[ Links: SemanticScholar]
WebSAIL Wikifier and TabEL
WebSAIL Wikifier is an Entity Linking system that identifies and links phrases to a Wikipedia page. The project participated in TAC 2013 and ERD 2014. We also attempted to “wikify” English Wikipedia to increase structured information on Wikipedia. Furthermore, Chandra has extended WebSAIL Wikifier for his TabEL project to extract entities from tables on websites. The algorithm here is an improved version of the original project.
TextJoiner and WikiTables
These projects aim to improve user information exploration of data from Wikipedia. TextJoiner is a system that allows users to interactively query and perform joins on facts expressed in Wikipedia using text patterns like
cities such as $x. The project uses n-gram language models and word embeddings. On the other hand, WikiTables extracts Wikipedia tables and used machine learning to enable table exploration via “search” and “join”. WikiTables allows user to find and view columns from different table side by side, and potentially discover an interesting correlation. (Only involved in discussion and developing UI)