Fintechv1.0 — available now

Real fraud patterns.
Zero real cardholders.

Name: Fraud Transactions
Creator: NexusMock
License: https://nexusmock.com/datasets/fraud-transactions#license

Synthetic credit-card transactions with five fraud typologies pre-labelled. Train your detection models today — without RGPD, HIPAA, PCI scope, or eighteen months of legal review.

Get the dataset — from 29€Download 100 rows free →

7-day full refund · commercial licence · download in 30 seconds

100k

labelled transactions

engineered columns

fraud typologies

0.92

baseline ROC AUC

The problem

Real cardholder data is a bureaucratic dead end.

6–18 months of legal review.

DPO sign-off, vendor due diligence, processor agreements, ethics committee. Your PoC dies in committee.

Kaggle datasets are exhausted.

The 2013 Credit Card Fraud dataset has 492 fraud rows and PCA-anonymised features. Every job candidate has memorised it.

Up to 20M€ fine on a single leak.

RGPD Art. 83 doesn't care how good your anonymisation is. Re-identification is everyone's job until it's yours.

Your competitors already ship.

While your legal team negotiates a DPA, the next fintech in YC is training on synthetic data and shipping models monthly.

The fix

A clean, labelled, internally-consistent sandbox. Today.

Clinically valid correlations.

Amount follows Benford's law (MAD 0.0012). Hour-of-day, MCC, 3DS status and geography correlate with fraud the way they do in real portfolios.

Five fraud typologies, pre-labelled.

card_testing, account_takeover, synthetic_identity, friendly_fraud, merchant_collusion. Each with a distinct, documented signature.

Ready for pandas.read_csv().

UTF-8, no nulls, no dirty types, monotonic IDs. Load it and your gradient boosting hits AUC 0.92 on the first try.

Zero legal scope.

Synthetic data is not personal data (RGPD Recital 26). No DPO, no consent forms, no ethics review, no PCI environment.

Preview

Real rows from the dataset.

These are actual rows from nexusmock_fraud_transactions_sample_100.csv. Red-tinted rows are the labelled positive class.

tx_id	timestamp	amount_usd	channel	is_fraud	fraud_type
TX_2026_000001410	2025-07-20T03:16:24Z	44.74	online	0	none
TX_2026_000008879	2026-04-19T04:22:59Z	15.89	online	0	none
TX_2026_000000178	2025-06-04T10:43:50Z	15.41	pos	0	none
TX_2026_000009273	2026-05-03T16:49:01Z	15.57	online	1	friendly_fraud
TX_2026_000000782	2025-06-28T10:58:28Z	1692.25	online	1	account_takeover

← scroll for more columns →

30 columns total · 100,000 rows in the Pro tier · velocity, distance, device age, BIN, MCC, 3DS, currency, and more.

Quality, audited

Every release ships with a quality report.

Inside the ZIP: QUALITY_REPORT.md with integrity checks, statistical conformity tests, multi-model benchmarks, and per-typology detectability. Your Data Scientist can audit it in 2 minutes.

PASS

Integrity (no nulls, no dupes)

4 / 4 criteria

0.0012

Benford MAD on amounts

close conformity

0.92

Test AUC (Random Forest)

12 features, no tuning

Fraud typologies

labelled & documented

Pricing — launch week

Buy it on a corporate card. No procurement, no PO, no NDA.

Launch pricing for the first 100 customers. After that, prices return to the crossed-out numbers below.

Starter

−81%

10,000 rows

29€149€

10,000 labelled transactions
30 columns + 5 fraud typologies
Starter Jupyter notebook included
Data dictionary + quality report
Commercial licence, 7-day refund

Buy Starter →

Chosen by 80%

Pro

−84%

100,000 rows

79€499€

100,000 labelled transactions
Everything in Starter
Statistically robust for ML training
Production-ready CSV (UTF-8, clean)
Email support within 1 business day

Buy Pro →

Enterprise

−88%

1,000,000 rows

249€1,999€

1,000,000 labelled transactions
Edge cases pack bundled
Everything in Pro
Custom verticals on request
Priority email support

Buy Enterprise →

Want to evaluate before you buy? Download a 100-row sample free →

FAQ

The questions everyone asks.

How is this different from the Kaggle Credit Card Fraud dataset?+

The Kaggle dataset has 492 fraud cases anonymised through PCA — you can't engineer features, you can't extend it, and every job candidate has memorised it. NexusMock ships full feature columns (MCC, BIN, 3DS, device, geo, velocity) so you can engineer, extend, and benchmark against meaningful baselines.

Is this legal to use commercially?+

Yes. The data is fully synthetic — no row corresponds to a real person, card, or merchant. Synthetic data is not personal data under RGPD Recital 26. The licence grants perpetual commercial use including for training models you sell. Read LICENSE.md in the package.

Will a model trained on this work in production?+

Models trained exclusively on synthetic data won't exactly replicate your portfolio's score distribution — that's true of every synthetic dataset. What you get is a clean labelled sandbox to iterate on features, models, rules and pipelines without legal risk. When you accumulate real labelled fraud, fine-tune on top.

Can I get a custom vertical (insurance, lending, crypto, telco)?+

Yes. Write to info@nexusmock.com with the vertical, the fraud typologies you care about, and the volume. Custom datasets are typically delivered in 5–10 business days.

Your model is waiting.

Skip the legal review. Skip the comité ético. Skip the public dataset everyone has already overfit. Train on something that actually carries signal.

Get Fraud Transactions — from 29€

Real fraud patterns.Zero real cardholders.