Investigating the Use of Active Learning for Classification of Ship Waste Dumping in the North Sea

Publication date

DOI

Document Type

Master Thesis

Collections

Open Access logo

License

CC-BY-NC-ND

Abstract

Detecting occurrences of ships discharging waste into the sea is important to reduce sea pollution, but difficult due to data and resource limitations. The act of inspecting whether a ship has discharged waste is expensive and true occurrences are expected to be rare. This makes it difficult to collect enough labels to use for classification by supervised machine learning. This thesis investigated the use of several active learning approaches (uncertainty sampling, density-weighted sampling, QBC sampling and xPAL sampling) to help increase the rate of learning using fewer training instances to classify looping behavior (a proxy variable for waste discharging). Trajectories were summarized to single instances to allow established active learning methods to select them to be queried. Experi- ments were performed for different selection/learning pipelines to classify both complete trajectories (post hoc) and partial (initial steps to real time) trajectories. Almost all active learning methods significantly improved learning for complete trajectory classification, on average reaching a macro F1 performance plateau of at least ∼90% within 50 queried instances, compared to ∼78% for random sampling after 100 instances. Different models were trained for different points in elapsed time in the trajectories for partial trajectory classification. Most active learning approaches either matched or outperformed random sampling for partial trajectory classification, depending on the evaluated time point. At the best time point the well performing methods, on average, reached a macro F1 performance plateau of at least ∼60% within 100 queried instances, compared to ∼50% for random sampling after the same amount of instances. These results suggest that active learning methods are a suitable approach to decreasing labeling efforts for the problem of looping detection for both complete and partial trajectories, and possibly for similar problems involving trajectories and/or high class imbalance.

Keywords

Supervised learning, active learning, trajectories, time series, anomaly detection, high class imbalance, waste discharge detection

Citation