2.3. Anti Money Laundering¶
This is a synthetic dataset generated to study AML techniques.
This AML Dataset and its generation techniques are described here.
It was originally published under the Community Data License Agreement Sharing 1.0, and this alternative format of the same data is published here under the terms of the same CDLA-Sharing-1.0 License.
2.3.1. Dataset Modification from the Original¶
This format of the original data splits the single table (CSV file) into two components compatible with property graph data structures.
There is a Vertex (Node) CSV file containing bank account information. A new property called acct_id is generated by juxtaposing the bank number and the account number, separated by a vertical bar (|).
There is an Edge CSV file containing the transactions.
2.3.1.1. Downloadable Datasets¶
We have prepared these versions of the data in this alternative form. The first is the full dataset with 45+ million transactions. The subsequent datasets are simply the first X transactions.
full: Full AML: 45,403,506-edges, 9,914,140-vertex (1.1G TGZ)
1M: 1 million transaction AML: 1,000,000-edges, 1,101,709-vertex (39M TGZ)
Accounts: 1M AML Accounts (44M CSV)
Transactions: 1M AML Transactions (100M CSV)