Analysis of question type classification and disambiguation

Publication date

DOI

Document Type

Master Thesis

Collections

Open Access logo

License

CC-BY-NC-ND

Abstract

This thesis proposes a research on question context analysis, utilizing Natural Language Processing (NLP) techniques. In past time, Question & Answer interactions were handled manually. Recent advancements in NLP and Machine Learning (ML) have created the opportunity to implement a plethora of Question & Answering Systems (QAS). These systems often utilize a classifier to classify the questions into types before answering them. The two most prominent classification methods historically are Rule-based models and Machine Learning models. Rule-based models utilize grammar rules and a dictionary to classify questions into categories and map these to the correct answer. Machine Learning models, such as neural networks, utilize mathematical equations to learn from a Question & Answer data set, with the intention of minimizing classification errors when matching question and answer pairs. This research aims to discover how popular classification techniques perform in a restricted domain environment, with regards to question type recognition and question enrichment. These two tasks are performed on a large structured Dutch data set and a public English data set. To determine how the classifiers score on these two tasks, two rigorous metrics are applied to determine classification power; F1 Score & Area under the ROC surface (AUC). Results suggest that question ambiguity can be recognized with an F1-score upwards of 90%. ML techniques featuring deep learning perform best across both question type detection and question enrichment.

Keywords

machine learning, NLP, natural language processing, question classification, question disambiguation, context analysis

Citation