Introducing MENCOD: Multi-modal ENsemble Citation Outlier Detector

Publication date

DOI

Document Type

Master Thesis

Collections

Open Access logo

License

CC-BY-NC-ND

Abstract

This paper introduces MENCOD (Multi-modal ENsemble Citation Outlier Detector), a novel approach for identifying outliers in academic literature screening. In this context, an outlier refers to a relevant paper that was not retrieved before the stopping rule of an active learning pipeline was triggered—typically because it was ranked much lower than other relevant papers. MENCOD addresses this by proposing a two-phase process: after stopping, a new model is trained using additional information not exploited in the first phase, such as citation networks and metadata. The method combines multiple Local Outlier Factor (LOF)-based models and an isolation forest, leveraging both structural and semantic features. Semantic similarity is computed using SPECTER2 embeddings and cosine similarity. Evaluated on three datasets from the Synergy project (Hall, Jeyaraman, and Appenzeller), MENCOD consistently reprioritized missed relevant papers more effectively than the baseline active learning approach. The improvements were 86.5%, 29.8%, and 75.7%, respectively—amounting to thousands of documents that no longer require manual screening. Although still a conceptual prototype, MENCOD shows strong potential for enhancing the recall of relevant literature in large-scale screening tasks.

Keywords

Citation