Identifying Caring Communities Within Dutch Chamber Of Commerce Data: A Classifier Comparison

Publication date

DOI

Document Type

Master Thesis

Collections

Open Access logo

License

CC-BY-NC-ND

Abstract

This paper aimed to determine the most effective classifier for identifying registered 'caring communities' using data from the Dutch Chamber of Commerce. I optimized and assessed the performance of four classifiers: Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and Gradient Boosting Tree (GBDT). The results show that LR consistently outperformed the other models across 2022 and 2023 test sets, excelling across all evaluation metrics. While GBDT showed competitive performance, SVM and RF were less effective. Despite LR's strengths, improvements in recall and data quality are essential for better identification of caring communities. Without these improvements, the algorithm may underestimate the total number of caring communities, leading to an incomplete understanding of their prevalence.

Keywords

Machine learning, Classification, Caring Communities

Citation