Efficiently and reliably evaluating text classification in data sampled via active learning

Text classification helps structure data such as medical documents but requires many labeled data examples, which are costly. Active learning reduces this cost by selecting only the most informative data to be labeled. This can lead to a biased assessment of a model due to the selection. The study explored to what extent active learning causes bias and whether this could be reduced by a technique called importance sampling. Findings show that importance sampling did reduce part of the bias but not entirely. More research is required before this method can be used in practice.

Keywords

Active learning, Importance sampling, Bias

URI

https://studenttheses.uu.nl/handle/20.500.12932/49017

Efficiently and reliably evaluating text classification in data sampled via active learning

Files

Publication date

Authors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI