Synthetic network generation for financial data

Publication date

DOI

Document Type

Master Thesis

Collections

Open Access logo

License

CC-BY-NC-ND

Abstract

This thesis presents a novel approach to generating synthetic transaction networks. The research focuses on developing a graph-based generative model capable of replicating charac- teristics observed in real-world financial networks. The motivation of this model is to preserve data privacy, and it generates networks that exhibit power-law degree distributions, no as- sortativity or disassortativity, exponential weight distributions, and community structures similar to those found in actual financial transaction data. The methodology involves a clustering analysis of a real transaction dataset to identify node-types, which are then integrated into the generative model. Parameters for node gen- eration, edge densification, and a probability matrix governing type-based connections are established to control the network’s structural properties. The model is validated against this real network dataset from Rabobank, by comparing the metrics and structural properties. Experimental results show that the model can produce stable synthetic networks over 200,000 iterations, with generated networks exhibiting comparable degree distributions, edge densities, and community structures to the real dataset. However, limitations include the use of a sampled and aggregated dataset for validation, which restricts the model’s ability to capture the full complexity of real financial networks, and the model’s exponential weight distribution diverging from the real dataset’s power-law weight distribution. This research contributes a publicly available tool, which can be used as a starting point for generating synthetic financial transaction networks, facilitating applications in machine learning model training for detecting criminal financial activity. Future research directions include improving weight distribution modeling, exploring algorithms for power-law distribu- tions, and extending the model to include interbank networks and temporal dynamics.

Keywords

Citation