Comparison

NexusMock Fraud vs the Kaggle 2013 dataset.
One is from 2013 and PCA-anonymised. One is from 2026 and engineered.

Every Data Scientist who has trained a fraud model has touched the Kaggle Credit Card Fraud dataset from 2013. It is the most-cited fraud detection benchmark on the public internet — 492 fraud rows out of 284,807, features anonymised through PCA, and zero possibility of feature engineering. It was groundbreaking when it dropped. In 2026, it is exhausted: every job candidate has memorised it, every notebook has overfit it, and not one of the features in V1–V28 corresponds to anything you have in your production card stream.

TL;DR

Use Kaggle 2013 for a teaching slide. Use NexusMock for a production-bound model. The 2013 dataset has no engineerable features, no 3DS, no MCC, no velocity, no labelled typologies. NexusMock ships 30 plaintext columns, 5 labelled fraud typologies, and a baseline AUC of 0.92 with friendly_fraud at 0.50 as a deliberate noise floor.

Feature-by-feature

What you actually get.

Year released

NexusMock
2026 (refreshed quarterly)
Kaggle 2013
September 2013

Feature columns

NexusMock
30 plaintext (MCC, BIN, 3DS, device, geo, velocity, …)
Kaggle 2013
28 PCA-anonymised + Time + Amount

Feature engineering possible

NexusMock
Yes — full schema documented
Kaggle 2013
No — PCA hides everything

Number of rows

NexusMock
Up to 1,000,000 (Enterprise tier)
Kaggle 2013
284,807

Fraud rows

NexusMock
~22,000 in 1M tier (2.2%)
Kaggle 2013
492 (0.17%)

Labelled fraud typologies

NexusMock
5 — card_testing, account_takeover, synthetic_identity, friendly_fraud, merchant_collusion
Kaggle 2013
1 binary label only

Realistic noise (hard cases)

NexusMock
friendly_fraud at AUC ~0.50 by design — irreducible noise floor
Kaggle 2013
Sub-population structure unknown

Benford's law conformity

NexusMock
MAD 0.0012 (close conformity, documented)
Kaggle 2013
Not documented

Commercial use licence

NexusMock
Perpetual commercial included with purchase
Kaggle 2013
Database Contents Licence — research-friendly, commercial unclear

Price

NexusMock
29 / 79 / 249 € for 10k / 100k / 1M rows
Kaggle 2013
Free

Overfitting risk

NexusMock
New release, fresh schema
Kaggle 2013
Every public notebook has overfit it

Support

NexusMock
Founder-answered email within 1 business day
Kaggle 2013
Kaggle discussions

When each one is right

Pick the one that matches your situation.

Pick NexusMock when…

  • You are training a model that will actually be deployed.
  • You want to do feature engineering on MCC, BIN, 3DS status, velocity.
  • You need labelled typologies to evaluate per-typology recall.
  • You want a fresh dataset that hasn't been memorised by every job candidate.
  • You want commercial-use clarity for a fraud detection product you sell.

Pick Kaggle 2013 when…

  • You are writing a tutorial blog post and need a free, well-known dataset.
  • You are a student learning how to call sklearn for the first time.
  • You are benchmarking against a paper that uses the Kaggle 2013 dataset specifically.

Bottom line

The honest verdict.

The Kaggle 2013 dataset will always have a home in teaching. For shipping a model — train on NexusMock and validate on real labelled fraud from your own portfolio. The two are complements, not substitutes.

Frequent objections

What buyers ask before deciding.

Why is the NexusMock baseline AUC lower than the Kaggle 2013 baseline?+

Because we ship the friendly_fraud typology with intentional AUC ~0.50 — the irreducible noise floor a real fraud team faces. A binary AUC of 0.92 is realistic; an AUC of 0.99+ with all typologies cleanly separable would be a synthetic toy.

Can I use both datasets together?+

Yes. Train on NexusMock for feature richness, validate on Kaggle 2013 for comparison with the published literature. Or fine-tune a model trained on NexusMock with your own labelled production fraud.

Why pay for synthetic data when Kaggle is free?+

You pay for the schema (30 engineered columns vs 28 PCA-blind), the labelled typologies (5 vs 1), the freshness (2026 vs 2013), the documented noise (Benford MAD 0.0012, friendly_fraud at 0.50), and the commercial licence. If the Kaggle 2013 dataset still meets your needs, you don't have a problem to solve.

Want to test NexusMock before buying?

Download a 100-row sample of any vertical, free, no email required.

Browse the catalog →