Comparison
NexusMock Fraud vs the Kaggle 2013 dataset.
One is from 2013 and PCA-anonymised. One is from 2026 and engineered.
Every Data Scientist who has trained a fraud model has touched the Kaggle Credit Card Fraud dataset from 2013. It is the most-cited fraud detection benchmark on the public internet — 492 fraud rows out of 284,807, features anonymised through PCA, and zero possibility of feature engineering. It was groundbreaking when it dropped. In 2026, it is exhausted: every job candidate has memorised it, every notebook has overfit it, and not one of the features in V1–V28 corresponds to anything you have in your production card stream.
TL;DR
Use Kaggle 2013 for a teaching slide. Use NexusMock for a production-bound model. The 2013 dataset has no engineerable features, no 3DS, no MCC, no velocity, no labelled typologies. NexusMock ships 30 plaintext columns, 5 labelled fraud typologies, and a baseline AUC of 0.92 with friendly_fraud at 0.50 as a deliberate noise floor.
Feature-by-feature
What you actually get.
Year released
- NexusMock✓
- 2026 (refreshed quarterly)
- Kaggle 2013
- September 2013
Feature columns
- NexusMock✓
- 30 plaintext (MCC, BIN, 3DS, device, geo, velocity, …)
- Kaggle 2013
- 28 PCA-anonymised + Time + Amount
Feature engineering possible
- NexusMock✓
- Yes — full schema documented
- Kaggle 2013
- No — PCA hides everything
Number of rows
- NexusMock✓
- Up to 1,000,000 (Enterprise tier)
- Kaggle 2013
- 284,807
Fraud rows
- NexusMock✓
- ~22,000 in 1M tier (2.2%)
- Kaggle 2013
- 492 (0.17%)
Labelled fraud typologies
- NexusMock✓
- 5 — card_testing, account_takeover, synthetic_identity, friendly_fraud, merchant_collusion
- Kaggle 2013
- 1 binary label only
Realistic noise (hard cases)
- NexusMock✓
- friendly_fraud at AUC ~0.50 by design — irreducible noise floor
- Kaggle 2013
- Sub-population structure unknown
Benford's law conformity
- NexusMock✓
- MAD 0.0012 (close conformity, documented)
- Kaggle 2013
- Not documented
Commercial use licence
- NexusMock✓
- Perpetual commercial included with purchase
- Kaggle 2013
- Database Contents Licence — research-friendly, commercial unclear
Price
- NexusMock
- 29 / 79 / 249 € for 10k / 100k / 1M rows
- Kaggle 2013✓
- Free
Overfitting risk
- NexusMock✓
- New release, fresh schema
- Kaggle 2013
- Every public notebook has overfit it
Support
- NexusMock✓
- Founder-answered email within 1 business day
- Kaggle 2013
- Kaggle discussions
When each one is right
Pick the one that matches your situation.
Pick NexusMock when…
- You are training a model that will actually be deployed.
- You want to do feature engineering on MCC, BIN, 3DS status, velocity.
- You need labelled typologies to evaluate per-typology recall.
- You want a fresh dataset that hasn't been memorised by every job candidate.
- You want commercial-use clarity for a fraud detection product you sell.
Pick Kaggle 2013 when…
- You are writing a tutorial blog post and need a free, well-known dataset.
- You are a student learning how to call sklearn for the first time.
- You are benchmarking against a paper that uses the Kaggle 2013 dataset specifically.
Bottom line
The honest verdict.
The Kaggle 2013 dataset will always have a home in teaching. For shipping a model — train on NexusMock and validate on real labelled fraud from your own portfolio. The two are complements, not substitutes.
Frequent objections
What buyers ask before deciding.
Why is the NexusMock baseline AUC lower than the Kaggle 2013 baseline?+
Because we ship the friendly_fraud typology with intentional AUC ~0.50 — the irreducible noise floor a real fraud team faces. A binary AUC of 0.92 is realistic; an AUC of 0.99+ with all typologies cleanly separable would be a synthetic toy.
Can I use both datasets together?+
Yes. Train on NexusMock for feature richness, validate on Kaggle 2013 for comparison with the published literature. Or fine-tune a model trained on NexusMock with your own labelled production fraud.
Why pay for synthetic data when Kaggle is free?+
You pay for the schema (30 engineered columns vs 28 PCA-blind), the labelled typologies (5 vs 1), the freshness (2026 vs 2013), the documented noise (Benford MAD 0.0012, friendly_fraud at 0.50), and the commercial licence. If the Kaggle 2013 dataset still meets your needs, you don't have a problem to solve.
Want to test NexusMock before buying?
Download a 100-row sample of any vertical, free, no email required.
Browse the catalog →