Machine-Learning-Based Dimensionality Assessment for Cognitive Diagnosis Models

Publication date

DOI

Document Type

Master Thesis

Collections

Open Access logo

License

CC-BY-NC-ND

Abstract

This thesis examines how supervised machine learning algorithms can be utilised to predict the number of hidden attributes in simulated data derived from cognitive diagnosis model (CDM) structures. It is essential for CDMs to estimate dimensionality accurately, but current methods often rely on expert opinion or stringent assumptions. To solve this problem, a da taset of more than 600,000 simulations was made, each with a different set of psychometric conditions and summarised by a set of structural and statistical features. Several machine learning models, including Random Forest, XGBoost, and a Multi-Layer Perceptron, were trained to identify the actual number of attributes used to generate the data. The experimental analysis involved feature engineering, hyperparameter optimisa tion, ensemble learning, and cost-sensitive training. The evaluation is based on macro-aver aged F1 scores, ROC AUC, and error distance metrics. The results show that all models out performed a simple baseline. The MLP model achieved the best performance when combined with a distance-aware loss function, yielding an F1 score of 0.59 and an AUC of 0.93. Cost sensitive learning helped reduce the average size of errors that occurred due to misclassifica tions. These results demonstrate that supervised machine learning can aid in dimensionality estimation for cognitive diagnosis modelling. The method is a scalable and data-driven alter native to traditional psychometric methods, showing considerable promise for use in educa tional and psychological assessment settings.

Keywords

Citation