Skip to content

Example: Credit Card Fraud Detection

This example demonstrates binary classification with a highly imbalanced credit card fraud dataset — detecting fraudulent transactions among 284K transactions.

Dataset

  • Source: Kaggle Credit Card Fraud Detection via mlg-ulb/creditcardfraud
  • Task: Detect fraudulent transactions (Class=1) vs legitimate (Class=0)
  • Features: 30 (28 PCA components V1–V28, Time, Amount)
  • Classes: 2 (0 = legitimate, 1 = fraud)
  • Samples: 284,807 (single CSV, used for both training and evaluation)
  • Class Distribution: 284,315 legitimate (99.83%) vs 492 fraud (0.17%)

Quick Start

The config files for this example are in examples/credit_card_fraud/:

# 1. Download data (requires kagglehub)
pip install kagglehub
python3 -c "
import kagglehub
path = kagglehub.dataset_download('mlg-ulb/creditcardfraud')
print(f'Downloaded to: {path}')
"

# 2. Point the datacard to your downloaded data
#    Edit examples/credit_card_fraud/dc_creditcard.yaml and update:
#      train_files:
#        - <kagglehub_path>/creditcard.csv
#      test_files:
#        - <kagglehub_path>/creditcard.csv

# 3. Train
pilz train \
  --datacard examples/credit_card_fraud/dc_creditcard.yaml \
  --trainsettings examples/credit_card_fraud/train_settings.yaml

# 4. Evaluate
pilz eval \
  --datacard examples/credit_card_fraud/dc_creditcard.yaml \
  --evalsettings examples/credit_card_fraud/eval_settings.yaml

Or use the provided script — paths are self-contained:

cd examples/credit_card_fraud
bash run.sh

DataCard Structure

All 30 features are numerical (the PCA components V1–V28 are continuous, Time and Amount are also numerical):

features:
  - name: Time
    statistical: numerical
    type: float
  - name: V1
    statistical: numerical
    type: float
  - name: V2
    statistical: numerical
    type: float
  ...
  - name: V28
    statistical: numerical
    type: float
  - name: Amount
    statistical: numerical
    type: float
  - name: Class
    statistical: categorial
    type: int

target:
  feature_name: Class
  values:
    - 0
    - 1

train_files:
  - /path/to/creditcard.csv
test_files:
  - /path/to/creditcard.csv

Settings (Quick Start)

n: 2                # 2 trees per class (4 trees total)
out_folder: creditcard_model
max_depth: 8        # Moderate depth to capture fraud patterns
frac_eval_cat: 0.8
max_eval_fit: 2000  # Keep manageable despite 284K rows
min_eval_fit: 5
n_dims: 2           # Pairwise feature combinations
n_cat: 3            # 3 bins per numerical feature
calcs_per_dim: 200  # Limited calculations per dimension
in_folders:
  - creditcard_model
out_folder: eval

Training Time

With quick-start settings on a modern laptop (Apple Silicon):

  • Training: ~12 seconds
  • Evaluation: ~1 second

Actual Results

Extreme Class Imbalance

This dataset has only 492 fraud cases (0.17%) out of 284,807 transactions. Pilz handles this naturally through:

  • Independent sampling: Equal rows per class during training
  • Per-node re-balancing: Each split sees balanced target/non-target data
  • Three-way splits: Neutral branch handles ambiguous regions

ROC Curve

Output Files

creditcard_model/
├── 0/
│   ├── 0.json    # Tree 0 for class 0 (legitimate)
│   └── 1.json    # Tree 1 for class 0
└── 1/
    ├── 0.json    # Tree 0 for class 1 (fraud)
    └── 1.json    # Tree 1 for class 1

eval/
├── 0_roc.html
├── 1_roc.html
├── all_roc.html
└── multi_class_result.html

Key Findings

  1. Extreme imbalance is handled natively
  2. No SMOTE or class weights needed
  3. Per-node re-balancing ensures both classes contribute equally at each split

  4. PCA features provide strong signal

  5. V1–V28 are PCA-transformed, making them orthogonal and well-suited for decision tree splits
  6. The model learns non-linear decision boundaries from these components

  7. Amount and Time add context

  8. Transaction amount and elapsed time provide additional signal beyond the PCA components

Tips

Quick Start Settings (current)

These are reduced for fast iteration. Training takes ~12 seconds and demonstrates the pipeline end-to-end.

For Better Fraud Detection

Increase these settings in train_settings.yaml:

n: 10               # More trees for ensemble stability
max_depth: 12       # Deeper trees to capture rare patterns
n_dims: 3           # Triple feature combinations
n_cat: 5            # Finer bins
calcs_per_dim: 1000 # More thorough search
max_eval_fit: 5000  # More training samples

With these settings expect: - Training time: 1-3 minutes - Fraud recall: significantly improved - Overall accuracy: similar (majority class saturation)

Dealing with Extreme Imbalance

The dataset has 99.83% legitimate vs 0.17% fraud:

  1. Start with quick-start settings to verify the pipeline works
  2. Increase max_depth and n_dims to capture subtle fraud patterns
  3. Add more trees (n=10 or n=20) for ensemble stability
  4. Monitor fraud recall specifically — not overall accuracy
  5. Consider using --datacard with separate test data to measure generalization