Optimal choice of sampling location for mapping with machine learning on a fixed budget.

Publication date

DOI

Document Type

Master Thesis

Collections

Open Access logo

License

CC-BY-NC-ND

Abstract

This study explores optimal sampling strategies for soil mapping with a constrained budget, focusing on predicting soil clay content using Digital Soil Mapping (DSM) techniques. Soil mapping is crucial for sustainable land management, impacting agriculture, environmental monitoring, and land use planning. Advances in remote sensing, GIS, and machine learning (ML) have improved the efficiency and accuracy of soil mapping. This research employs Random Forest (RF) models to compare the efficacy of Simple Random Sampling (SRS) and Conditioned Latin Hypercube Sampling (cLHS). Using a dataset of 3,670 geo-referenced soil samples from Ebergötzen, Germany, the RF models were trained and validated, with key predictors identified. Results indicate that SRS generally offers lower Root Mean Square Error (RMSE) values and higher predictive accuracy compared to cLHS. The study also evaluates the impact of measurement errors and different sampling strategies. A significant finding is that a mixed-method approach, combining 25% high-cost, high-accuracy sampling (Method A) with 75% low-cost, lower-accuracy sampling (Method B), provides the optimal balance between accuracy and costefficiency. This approach achieved the lowest median RMSE, demonstrating the highest accuracy among the tested scenarios. The findings suggest that integrating diverse sampling methods can enhance the reliability and cost-effectiveness of soil property predictions, offering practical guidelines for improving DSM and land management practices.

Keywords

Digital Soil Mapping (DSM) Random Forest (RF) Soil Clay Content Conditioned Latin Hypercube Sampling (cLHS) Simple Random Sampling (SRS) Root Mean Square Error (RMSE) Geometric Transformations Predictive Modeling Environmental Covariates Sampling Strategies

Citation