Thanapon Noraset

NOR
phd, computer science

About

Nor is a faculty member in the Faculty of Information and Communication Technology, Mahidol University. He complete his Ph.D. from Northwestern University in 2018 under the supervision under Doug Downey. His research interests include natural language processing and machine learning, and currently he forcuses on statistical language modeling using neural networks. He is exploring how to use explicit knowledge such as definitions from dictionaries and statistics of language usage to effectively improve natural language capability of language models.

Project

RNNLM and n-gram statistics

This project explores how explicitly stated n-gram distribution can be used as a set of soft constraints to direct the language model behavior during the generation without sacrificing its accuracy in assigning probability to sequences of words. We apply the technique to reduce word-level repetition (a common problematic behavior). It also improves model generalizability by incorporating statistical constraints are n-gram statistics taken from a large corpus.

[ Paper: AAAI’18 | Links: code]

Definition Modeling

This project aims to build generative models of definitions in dictionaries and encyclopedias. The model learns from pairs of word embedding and definition. The model is then tested on how well it generates definitions for a given set of word embeddings. We are exploring deep learning algorithms and other language models.

[ Paper: AAAI’17 or arXiv | Links: demo, code, preprocessing ]

Key Phrase Extraction

This is a collaboration between WebSAIL and Allen Institute for Artificial Intelligence (AI2). The project focuses on building a system that extracts key phrases from scholar articles. These key phrases are facet values to filter search results. Chandra is maintaining the system at AI2. semanticscholar.org is using a version of the system.

[ Links: SemanticScholar]

WebSAIL Wikifier and TabEL

WebSAIL Wikifier is an Entity Linking system that identifies and links phrases to a Wikipedia page. The project participated in TAC 2013 and ERD 2014. We also attempted to “wikify” English Wikipedia to increase structured information on Wikipedia. Furthermore, Chandra has extended WebSAIL Wikifier for his TabEL project to extract entities from tables on websites. The algorithm here is an improved version of the original project.

[ Papers: 3W, TabEL, ERD, TAC | Links: 3W, TabEL ]

TextJoiner and WikiTables

These projects aim to improve user information exploration of data from Wikipedia. TextJoiner is a system that allows users to interactively query and perform joins on facts expressed in Wikipedia using text patterns like cities such as $x. The project uses n-gram language models and word embeddings. On the other hand, WikiTables extracts Wikipedia tables and used machine learning to enable table exploration via “search” and “join”. WikiTables allows user to find and view columns from different table side by side, and potentially discover an interesting correlation. (Only involved in discussion and developing UI)

[ Paper: TextJoiner, WikiTables | Link: TextJoiner, WikiTables Join, WikiTables Search ]

Publication

[ Semantics Scholar | Google Scholar ]

Controlling global statistics in recurrent neural network text generation
Thanapon Noraset, David Demeter, and Doug Downey
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018)

Definition modeling: Learning to define word embeddings in natural language
Thanapon Noraset, Chen Liang, Larry Birnbaum, and Doug Downey
Proceedings of the Thirty-First Conference on Artificial Intelligence (AAAI 2017)

TabEL: Entity Linking in WebTables
Chandra Sekhar Bhagavatula, Thanapon Noraset, and Doug Downey
14th International Semantic Web Conference Proceedings (ISWC 2015)

TextJoiner: On-demand information extraction with multi-pattern queries
Chandra Sekhar Bhagavatula, Thanapon Noraset, and Doug Downey
2014 Workshop on Automated knowledge base construc-tion (AKBC 2014)

Adding high-precision links to Wikipedia
Thanapon Noraset, Chandra Sekhar Bhagavatula, and Doug Downey
Proceedings of the 2014 Conferenceon Empirical Methods in Natural Language Processing (EMNLP 2014)

WebSAIL wikifier at ERD 2014
Thanapon Noraset, Chandra Sekhar Bhagavatula, and Doug Downey
Proceedings of the First International Workshopon Entity Recognition and Disambiguation (ERD 2014)

Contact

The best way to reach Nor is to send him an email: