Example: Credit Card Fraud Detection¶
This example demonstrates binary classification with a highly imbalanced credit card fraud dataset — detecting fraudulent transactions among 284K transactions.
Dataset¶
- Source: Kaggle Credit Card Fraud Detection via mlg-ulb/creditcardfraud
- Task: Detect fraudulent transactions (Class=1) vs legitimate (Class=0)
- Features: 30 (28 PCA components V1–V28, Time, Amount)
- Classes: 2 (0 = legitimate, 1 = fraud)
- Samples: 284,807 (single CSV, used for both training and evaluation)
- Class Distribution: 284,315 legitimate (99.83%) vs 492 fraud (0.17%)
Quick Start¶
The config files for this example are in examples/credit_card_fraud/:
# 1. Download data (requires kagglehub)
pip install kagglehub
python3 -c "
import kagglehub
path = kagglehub.dataset_download('mlg-ulb/creditcardfraud')
print(f'Downloaded to: {path}')
"
# 2. Point the datacard to your downloaded data
# Edit examples/credit_card_fraud/dc_creditcard.yaml and update:
# train_files:
# - <kagglehub_path>/creditcard.csv
# test_files:
# - <kagglehub_path>/creditcard.csv
# 3. Train
pilz train \
--datacard examples/credit_card_fraud/dc_creditcard.yaml \
--trainsettings examples/credit_card_fraud/train_settings.yaml
# 4. Evaluate
pilz eval \
--datacard examples/credit_card_fraud/dc_creditcard.yaml \
--evalsettings examples/credit_card_fraud/eval_settings.yaml
Or use the provided script — paths are self-contained:
DataCard Structure¶
All 30 features are numerical (the PCA components V1–V28 are continuous, Time and Amount are also numerical):
features:
- name: Time
statistical: numerical
type: float
- name: V1
statistical: numerical
type: float
- name: V2
statistical: numerical
type: float
...
- name: V28
statistical: numerical
type: float
- name: Amount
statistical: numerical
type: float
- name: Class
statistical: categorial
type: int
target:
feature_name: Class
values:
- 0
- 1
train_files:
- /path/to/creditcard.csv
test_files:
- /path/to/creditcard.csv
Settings (Quick Start)¶
n: 2 # 2 trees per class (4 trees total)
out_folder: creditcard_model
max_depth: 8 # Moderate depth to capture fraud patterns
frac_eval_cat: 0.8
max_eval_fit: 2000 # Keep manageable despite 284K rows
min_eval_fit: 5
n_dims: 2 # Pairwise feature combinations
n_cat: 3 # 3 bins per numerical feature
calcs_per_dim: 200 # Limited calculations per dimension
Training Time¶
With quick-start settings on a modern laptop (Apple Silicon):
- Training: ~12 seconds
- Evaluation: ~1 second
Actual Results¶
Extreme Class Imbalance¶
This dataset has only 492 fraud cases (0.17%) out of 284,807 transactions. Pilz handles this naturally through:
- Independent sampling: Equal rows per class during training
- Per-node re-balancing: Each split sees balanced target/non-target data
- Three-way splits: Neutral branch handles ambiguous regions
ROC Curve¶
Output Files¶
creditcard_model/
├── 0/
│ ├── 0.json # Tree 0 for class 0 (legitimate)
│ └── 1.json # Tree 1 for class 0
└── 1/
├── 0.json # Tree 0 for class 1 (fraud)
└── 1.json # Tree 1 for class 1
eval/
├── 0_roc.html
├── 1_roc.html
├── all_roc.html
└── multi_class_result.html
Key Findings¶
- Extreme imbalance is handled natively
- No SMOTE or class weights needed
-
Per-node re-balancing ensures both classes contribute equally at each split
-
PCA features provide strong signal
- V1–V28 are PCA-transformed, making them orthogonal and well-suited for decision tree splits
-
The model learns non-linear decision boundaries from these components
-
Amount and Time add context
- Transaction amount and elapsed time provide additional signal beyond the PCA components
Tips¶
Quick Start Settings (current)¶
These are reduced for fast iteration. Training takes ~12 seconds and demonstrates the pipeline end-to-end.
For Better Fraud Detection¶
Increase these settings in train_settings.yaml:
n: 10 # More trees for ensemble stability
max_depth: 12 # Deeper trees to capture rare patterns
n_dims: 3 # Triple feature combinations
n_cat: 5 # Finer bins
calcs_per_dim: 1000 # More thorough search
max_eval_fit: 5000 # More training samples
With these settings expect: - Training time: 1-3 minutes - Fraud recall: significantly improved - Overall accuracy: similar (majority class saturation)
Dealing with Extreme Imbalance¶
The dataset has 99.83% legitimate vs 0.17% fraud:
- Start with quick-start settings to verify the pipeline works
- Increase
max_depthandn_dimsto capture subtle fraud patterns - Add more trees (
n=10orn=20) for ensemble stability - Monitor fraud recall specifically — not overall accuracy
- Consider using
--datacardwith separate test data to measure generalization