Example: Credit Card Fraud Detection¶

This example demonstrates binary classification with a highly imbalanced credit card fraud dataset — detecting fraudulent transactions among 284K transactions.

Dataset¶

Source: Kaggle Credit Card Fraud Detection via mlg-ulb/creditcardfraud
Task: Detect fraudulent transactions (Class=1) vs legitimate (Class=0)
Features: 30 (28 PCA components V1–V28, Time, Amount)
Classes: 2 (0 = legitimate, 1 = fraud)
Samples: 284,807 (single CSV, used for both training and evaluation)
Class Distribution: 284,315 legitimate (99.83%) vs 492 fraud (0.17%)

Quick Start¶

The config files for this example are in examples/credit_card_fraud/:

# 1. Download data (requires kagglehub)
pip install kagglehub
python3 -c "
import kagglehub
path = kagglehub.dataset_download('mlg-ulb/creditcardfraud')
print(f'Downloaded to: {path}')
"

# 2. Point the datacard to your downloaded data
#    Edit examples/credit_card_fraud/dc_creditcard.yaml and update:
#      train_files:
#        - <kagglehub_path>/creditcard.csv
#      test_files:
#        - <kagglehub_path>/creditcard.csv

# 3. Train
pilz train \
  --datacard examples/credit_card_fraud/dc_creditcard.yaml \
  --trainsettings examples/credit_card_fraud/train_settings.yaml

# 4. Evaluate
pilz eval \
  --datacard examples/credit_card_fraud/dc_creditcard.yaml \
  --evalsettings examples/credit_card_fraud/eval_settings.yaml

Or use the provided script — paths are self-contained:

cd examples/credit_card_fraud
bash run.sh

DataCard Structure¶

All 30 features are numerical (the PCA components V1–V28 are continuous, Time and Amount are also numerical):

features:
  - name: Time
    statistical: numerical
    type: float
  - name: V1
    statistical: numerical
    type: float
  - name: V2
    statistical: numerical
    type: float
  ...
  - name: V28
    statistical: numerical
    type: float
  - name: Amount
    statistical: numerical
    type: float
  - name: Class
    statistical: categorial
    type: int

target:
  feature_name: Class
  values:
    - 0
    - 1

train_files:
  - /path/to/creditcard.csv
test_files:
  - /path/to/creditcard.csv

Settings (Quick Start)¶

n: 2                # 2 trees per class (4 trees total)
out_folder: creditcard_model
max_depth: 8        # Moderate depth to capture fraud patterns
frac_eval_cat: 0.8
max_eval_fit: 2000  # Keep manageable despite 284K rows
min_eval_fit: 5
n_dims: 2           # Pairwise feature combinations
n_cat: 3            # 3 bins per numerical feature
calcs_per_dim: 200  # Limited calculations per dimension

in_folders:
  - creditcard_model
out_folder: eval

Training Time¶

With quick-start settings on a modern laptop (Apple Silicon):

Training: ~12 seconds
Evaluation: ~1 second

Actual Results¶

Extreme Class Imbalance¶

This dataset has only 492 fraud cases (0.17%) out of 284,807 transactions. Pilz handles this naturally through:

Independent sampling: Equal rows per class during training
Per-node re-balancing: Each split sees balanced target/non-target data
Three-way splits: Neutral branch handles ambiguous regions

ROC Curve¶

Output Files¶

creditcard_model/
├── 0/
│   ├── 0.json    # Tree 0 for class 0 (legitimate)
│   └── 1.json    # Tree 1 for class 0
└── 1/
    ├── 0.json    # Tree 0 for class 1 (fraud)
    └── 1.json    # Tree 1 for class 1

eval/
├── 0_roc.html
├── 1_roc.html
├── all_roc.html
└── multi_class_result.html

Key Findings¶

Extreme imbalance is handled natively
No SMOTE or class weights needed
Per-node re-balancing ensures both classes contribute equally at each split
PCA features provide strong signal
V1–V28 are PCA-transformed, making them orthogonal and well-suited for decision tree splits
The model learns non-linear decision boundaries from these components
Amount and Time add context
Transaction amount and elapsed time provide additional signal beyond the PCA components

Tips¶

Quick Start Settings (current)¶

These are reduced for fast iteration. Training takes ~12 seconds and demonstrates the pipeline end-to-end.

For Better Fraud Detection¶

Increase these settings in train_settings.yaml:

n: 10               # More trees for ensemble stability
max_depth: 12       # Deeper trees to capture rare patterns
n_dims: 3           # Triple feature combinations
n_cat: 5            # Finer bins
calcs_per_dim: 1000 # More thorough search
max_eval_fit: 5000  # More training samples

With these settings expect: - Training time: 1-3 minutes - Fraud recall: significantly improved - Overall accuracy: similar (majority class saturation)

Dealing with Extreme Imbalance¶

The dataset has 99.83% legitimate vs 0.17% fraud:

Start with quick-start settings to verify the pipeline works
Increase max_depth and n_dims to capture subtle fraud patterns
Add more trees (n=10 or n=20) for ensemble stability
Monitor fraud recall specifically — not overall accuracy
Consider using --datacard with separate test data to measure generalization