Comparison

NexusMock Fraud vs the Kaggle 2013 dataset.
One is from 2013 and PCA-anonymised. One is from 2026 and engineered.

Every Data Scientist who has trained a fraud model has touched the Kaggle Credit Card Fraud dataset from 2013. It is the most-cited fraud detection benchmark on the public internet — 492 fraud rows out of 284,807, features anonymised through PCA, and zero possibility of feature engineering. It was groundbreaking when it dropped. In 2026, it is exhausted: every job candidate has memorised it, every notebook has overfit it, and not one of the features in V1–V28 corresponds to anything you have in your production card stream.

TL;DR

Use Kaggle 2013 for a teaching slide. Use NexusMock for a production-bound model. The 2013 dataset has no engineerable features, no 3DS, no MCC, no velocity, no labelled typologies. NexusMock ships 30 plaintext columns, 5 labelled fraud typologies, and a baseline AUC of 0.92 with friendly_fraud at 0.50 as a deliberate noise floor.

Feature-by-feature

What you actually get.

Feature	NexusMock	Kaggle 2013
Year released	2026 (refreshed quarterly)✓	September 2013
Feature columns	30 plaintext (MCC, BIN, 3DS, device, geo, velocity, …)✓	28 PCA-anonymised + Time + Amount
Feature engineering possible	Yes — full schema documented✓	No — PCA hides everything
Number of rows	Up to 1,000,000 (Enterprise tier)✓	284,807
Fraud rows	~22,000 in 1M tier (2.2%)✓	492 (0.17%)
Labelled fraud typologies	5 — card_testing, account_takeover, synthetic_identity, friendly_fraud, merchant_collusion✓	1 binary label only
Realistic noise (hard cases)	friendly_fraud at AUC ~0.50 by design — irreducible noise floor✓	Sub-population structure unknown
Benford's law conformity	MAD 0.0012 (close conformity, documented)✓	Not documented
Commercial use licence	Perpetual commercial included with purchase✓	Database Contents Licence — research-friendly, commercial unclear
Price	29 / 79 / 249 € for 10k / 100k / 1M rows	Free✓
Overfitting risk	New release, fresh schema✓	Every public notebook has overfit it
Support	Founder-answered email within 1 business day✓	Kaggle discussions

Year released

NexusMock✓: 2026 (refreshed quarterly)
Kaggle 2013: September 2013

Feature columns

NexusMock✓: 30 plaintext (MCC, BIN, 3DS, device, geo, velocity, …)
Kaggle 2013: 28 PCA-anonymised + Time + Amount

Feature engineering possible

NexusMock✓: Yes — full schema documented
Kaggle 2013: No — PCA hides everything

Number of rows

NexusMock✓: Up to 1,000,000 (Enterprise tier)
Kaggle 2013: 284,807

Fraud rows

NexusMock✓: ~22,000 in 1M tier (2.2%)
Kaggle 2013: 492 (0.17%)

Labelled fraud typologies

NexusMock✓: 5 — card_testing, account_takeover, synthetic_identity, friendly_fraud, merchant_collusion
Kaggle 2013: 1 binary label only

Realistic noise (hard cases)

NexusMock✓: friendly_fraud at AUC ~0.50 by design — irreducible noise floor
Kaggle 2013: Sub-population structure unknown

Benford's law conformity

NexusMock✓: MAD 0.0012 (close conformity, documented)
Kaggle 2013: Not documented

Commercial use licence

NexusMock✓: Perpetual commercial included with purchase
Kaggle 2013: Database Contents Licence — research-friendly, commercial unclear

Price

NexusMock: 29 / 79 / 249 € for 10k / 100k / 1M rows
Kaggle 2013✓: Free

Overfitting risk

NexusMock✓: New release, fresh schema
Kaggle 2013: Every public notebook has overfit it

Support

NexusMock✓: Founder-answered email within 1 business day
Kaggle 2013: Kaggle discussions

When each one is right

Pick the one that matches your situation.

Pick NexusMock when…

You are training a model that will actually be deployed.
You want to do feature engineering on MCC, BIN, 3DS status, velocity.
You need labelled typologies to evaluate per-typology recall.
You want a fresh dataset that hasn't been memorised by every job candidate.
You want commercial-use clarity for a fraud detection product you sell.

Pick Kaggle 2013 when…

You are writing a tutorial blog post and need a free, well-known dataset.
You are a student learning how to call sklearn for the first time.
You are benchmarking against a paper that uses the Kaggle 2013 dataset specifically.

Bottom line

The honest verdict.

The Kaggle 2013 dataset will always have a home in teaching. For shipping a model — train on NexusMock and validate on real labelled fraud from your own portfolio. The two are complements, not substitutes.

Frequent objections

What buyers ask before deciding.

Why is the NexusMock baseline AUC lower than the Kaggle 2013 baseline?+

Because we ship the friendly_fraud typology with intentional AUC ~0.50 — the irreducible noise floor a real fraud team faces. A binary AUC of 0.92 is realistic; an AUC of 0.99+ with all typologies cleanly separable would be a synthetic toy.

Can I use both datasets together?+

Yes. Train on NexusMock for feature richness, validate on Kaggle 2013 for comparison with the published literature. Or fine-tune a model trained on NexusMock with your own labelled production fraud.

Why pay for synthetic data when Kaggle is free?+

You pay for the schema (30 engineered columns vs 28 PCA-blind), the labelled typologies (5 vs 1), the freshness (2026 vs 2013), the documented noise (Benford MAD 0.0012, friendly_fraud at 0.50), and the commercial licence. If the Kaggle 2013 dataset still meets your needs, you don't have a problem to solve.

Want to test NexusMock before buying?

Download a 100-row sample of any vertical, free, no email required.

Browse the catalog →

NexusMock Fraud vs the Kaggle 2013 dataset.One is from 2013 and PCA-anonymised. One is from 2026 and engineered.