Identification and on-line incremental clustering of spam campaigns

Publication date

DOI

Document Type

Master Thesis

Collections

Open Access logo

License

CC-BY-NC-ND

Abstract

The ever growing spread of spam emails, despite being adequately fought by spam filters, can be more effectively addressed by understanding how spammers act. Grouping spam emails into spam campaigns, provides valu- able information on many aspects; how spammers obfuscate and correlation between seemingly different spam campaigns as well as many descriptive statistics. In this thesis, we focus on identifying spam campaigns from a 7.5 months period by clustering the web pages, which are referred to by the URLs inside the spam emails, based on their content. Following that, we apply Latent Dirichlet Allocation to assign a topic to every cluster and finally, we present a mechanism that incrementally clusters the incoming spam emails into spam campaigns in an automatic and on-line environment. We argue that our method for spam campaign identification is quick and efficient, able to represent the identified spam campaigns in a compact man- ner. On top of that it can assist towards better understanding of the domain and its applications.

Keywords

Spam campaigns, spam emails, clustering, topic modelling, incremental, online, automatically

Citation