Predictive machine learning for a housing corporation

Publication date

Authors

DOI

Document Type

Master Thesis

Collections

Open Access logo

License

CC-BY-NC-ND

Abstract

In the process of renting a house, payment arrears may happen to some tenants. Normally, the housing corporation can only take actions after the problems occurred. In this thesis, several machine learning and subgroup discovery algorithms are used to detect in advance people who are more likely to cause payment problems. The chosen machine leaning algorithms include logistic regression, random forests, k nearest neighbors, naive bayes and neural networks using model averaging, while the PRIM algorithm is selected for subgroup discovery. Because the skewed distribution of classes in datasets, we utilize the synthetic minority over-sampling technique (SMOTE) to generate more reasonable results. Additionally, feature selection and several ensemble methods are leveraged as well to improve the model performance, such as averaging, majority voting and stacking. By all these approaches, finally, we are able to get a few models that are significantly b etter than the preliminary one. However, since the available data is limited and incomplete, and important time-based information is missing, we can’t obtain a model which is good enough.

Keywords

machine learning, subgroup discovery, SMOTE, ensemble, payment problems, housing corporation

Citation