Real fraud patterns.
Zero real cardholders.
Synthetic credit-card transactions with five fraud typologies pre-labelled. Train your detection models today — without RGPD, HIPAA, PCI scope, or eighteen months of legal review.
7-day full refund · commercial licence · download in 30 seconds
The problem
Real cardholder data is a bureaucratic dead end.
6–18 months of legal review.
DPO sign-off, vendor due diligence, processor agreements, ethics committee. Your PoC dies in committee.
Kaggle datasets are exhausted.
The 2013 Credit Card Fraud dataset has 492 fraud rows and PCA-anonymised features. Every job candidate has memorised it.
Up to 20M€ fine on a single leak.
RGPD Art. 83 doesn't care how good your anonymisation is. Re-identification is everyone's job until it's yours.
Your competitors already ship.
While your legal team negotiates a DPA, the next fintech in YC is training on synthetic data and shipping models monthly.
The fix
A clean, labelled, internally-consistent sandbox. Today.
Clinically valid correlations.
Amount follows Benford's law (MAD 0.0012). Hour-of-day, MCC, 3DS status and geography correlate with fraud the way they do in real portfolios.
Five fraud typologies, pre-labelled.
card_testing, account_takeover, synthetic_identity, friendly_fraud, merchant_collusion. Each with a distinct, documented signature.
Ready for pandas.read_csv().
UTF-8, no nulls, no dirty types, monotonic IDs. Load it and your gradient boosting hits AUC 0.92 on the first try.
Zero legal scope.
Synthetic data is not personal data (RGPD Recital 26). No DPO, no consent forms, no ethics review, no PCI environment.
Preview
Real rows from the dataset.
These are actual rows from nexusmock_fraud_transactions_sample_100.csv. Red-tinted rows are the labelled positive class.
| tx_id | timestamp | amount_usd | card→ip | mcc | channel | 3ds | dev_age | is_fraud | fraud_type |
|---|---|---|---|---|---|---|---|---|---|
| TX_2026_000001410 | 2025-07-20T03:16:24Z | 44.74 | online | 0 | none | ||||
| TX_2026_000008879 | 2026-04-19T04:22:59Z | 15.89 | online | 0 | none | ||||
| TX_2026_000000178 | 2025-06-04T10:43:50Z | 15.41 | pos | 0 | none | ||||
| TX_2026_000009273 | 2026-05-03T16:49:01Z | 15.57 | online | 1 | friendly_fraud | ||||
| TX_2026_000000782 | 2025-06-28T10:58:28Z | 1692.25 | online | 1 | account_takeover |
← scroll for more columns →
30 columns total · 100,000 rows in the Pro tier · velocity, distance, device age, BIN, MCC, 3DS, currency, and more.
Quality, audited
Every release ships with a quality report.
Inside the ZIP: QUALITY_REPORT.md with integrity checks, statistical conformity tests, multi-model benchmarks, and per-typology detectability. Your Data Scientist can audit it in 2 minutes.
Pricing — launch week
Buy it on a corporate card. No procurement, no PO, no NDA.
Launch pricing for the first 100 customers. After that, prices return to the crossed-out numbers below.
- 10,000 labelled transactions
- 30 columns + 5 fraud typologies
- Starter Jupyter notebook included
- Data dictionary + quality report
- Commercial licence, 7-day refund
- 100,000 labelled transactions
- Everything in Starter
- Statistically robust for ML training
- Production-ready CSV (UTF-8, clean)
- Email support within 1 business day
- 1,000,000 labelled transactions
- Edge cases pack bundled
- Everything in Pro
- Custom verticals on request
- Priority email support
Want to evaluate before you buy? Download a 100-row sample free →
FAQ
The questions everyone asks.
How is this different from the Kaggle Credit Card Fraud dataset?+
The Kaggle dataset has 492 fraud cases anonymised through PCA — you can't engineer features, you can't extend it, and every job candidate has memorised it. NexusMock ships full feature columns (MCC, BIN, 3DS, device, geo, velocity) so you can engineer, extend, and benchmark against meaningful baselines.
Is this legal to use commercially?+
Yes. The data is fully synthetic — no row corresponds to a real person, card, or merchant. Synthetic data is not personal data under RGPD Recital 26. The licence grants perpetual commercial use including for training models you sell. Read LICENSE.md in the package.
Will a model trained on this work in production?+
Models trained exclusively on synthetic data won't exactly replicate your portfolio's score distribution — that's true of every synthetic dataset. What you get is a clean labelled sandbox to iterate on features, models, rules and pipelines without legal risk. When you accumulate real labelled fraud, fine-tune on top.
Can I get a custom vertical (insurance, lending, crypto, telco)?+
Yes. Write to info@nexusmock.com with the vertical, the fraud typologies you care about, and the volume. Custom datasets are typically delivered in 5–10 business days.
Your model is waiting.
Skip the legal review. Skip the comité ético. Skip the public dataset everyone has already overfit. Train on something that actually carries signal.
Get Fraud Transactions — from 29€