Thanapon Noraset

computer science graduate student


Nor is a computer science PhD student in EECS at Northwestern University. He is working in WebSAIL, a research group advised by Prof. Doug Downey. His research interests include natural language processing and machine learning. His work includes information extraction in Wikipedia and scholar articles. Currently he focuses on statistical language modeling. He is exploring how explicit knowledge such as dictionaries and encyclopedias can be used to effectively improve natural language capability of language models.


Definition Models

This project aims to build generative models of definitions in dictionaries and encyclopedias. The model learns from pairs of word embedding and definition. The model is then tested on how well it generates definitions for a given set of word embeddings. We are exploring deep learning algorithms and other language models.

[ Paper: AAAI’17 or arXiv | Links: demo, code, preprocessing ]

Key Phrase Extraction

This is a collaboration between WebSAIL and Allen Institute for Artificial Intelligence (AI2). The project focuses on building a system that extracts key phrases from scholar articles. These key phrases are facet values to filter search results. Chandra is maintaining the system at AI2. is using a version of the system.

[ Links: SemanticScholar]

WebSAIL Wikifier and TabEL

WebSAIL Wikifier is an Entity Linking system that identifies and links phrases to a Wikipedia page. The project participated in TAC 2013 and ERD 2014. We also attempted to “wikify” English Wikipedia to increase structured information on Wikipedia. Furthermore, Chandra has extended WebSAIL Wikifier for his TabEL project to extract entities from tables on websites. The algorithm here is an improved version of the original project.

[ Papers: 3W, TabEL, ERD, TAC | Links: 3W, TabEL ]

TextJoiner and WikiTables

These projects aim to improve user information exploration of data from Wikipedia. TextJoiner is a system that allows users to interactively query and perform joins on facts expressed in Wikipedia using text patterns like cities such as $x. The project uses n-gram language models and word embeddings. On the other hand, WikiTables extracts Wikipedia tables and used machine learning to enable table exploration via “search” and “join”. WikiTables allows user to find and view columns from different table side by side, and potentially discover an interesting correlation. (Only involved in discussion and developing UI)

[ Paper: TextJoiner, WikiTables | Link: TextJoiner, WikiTables Join, WikiTables Search ]


[ Semantics Scholar | Google Scholar ]

  1. Thanapon Noraset, Chen Liang, Larry Birnbaum, and Doug Downey. 2017. Definition Modeling: Learning to define word embeddings in natural language. In Proceedings of the Thirty-First Conference on Artificial Intelligence (AAAI 2017).
    [BibTex] [Article] [Project]
  2. Chandra Sekhar Bhagavatula, Thanapon Noraset, and Doug Downey. 2015. TabEL: Entity Linking in Web Tables. In The Semantic Web - ISWC 2015: 14th International Semantic Web Conference Proceedings, pages 425–441. Springer International Publishing, edition.
    [BibTex] [Article] [Project]
  3. Thanapon Noraset, Chandra Sekhar Bhagavatula, and Doug Downey. 2014. Adding High-Precision Links to Wikipedia. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 651–656. ACL.
    [BibTex] [Article] [Project]
  4. Thanapon Noraset, Chandra Sekhar Bhagavatula, and Doug Downey. 2014. WebSAIL Wikifier at ERD 2014. In Proceedings of the First International Workshop on Entity Recognition & Disambiguation, pages 119–124. ACM.
    [BibTex] [Article]
  5. Chandra Sekhar Bhagavatula, Thanapon Noraset, and Doug Downey. 2014. TextJoiner: On-demand Information Extraction with Multi-Pattern Queries. In 2014 workshop on Automated knowledge base construction (AKBC).
    [BibTex] [Article] [Project]
  6. Chandra Sekhar Bhagavatula, Thanapon Noraset, and Doug Downey. 2013. Methods for Exploring and Mining Tables on Wikipedia. In Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics, pages 18–26. ACM.
    [BibTex] [Article] [Project]


The easiest way to reach Nor is to send him an email: