Installation¶

This guide will get Pilz installed and running in under 5 minutes.

Prerequisites¶

Python 3.13 or higher
macOS (Apple Silicon) or Linux (x86)
(Windows support coming soon)

?> Note: Pilz requires Python 3.13+ due to dependencies on the mi-amore library.

Step 1: Check Your Python Version¶

python --version

You should see Python 3.13.x or higher. If not, you'll need to install Python 3.13.

Step 2: Install Pilz¶

Choose your preferred method:

=== "Using uv (Recommended)"

```bash

# Create a new project

uv init pilz-project

cd pilz-project



# Add Pilz

uv add pilz



# Activate the environment

source .venv/bin/activate

```

=== "Using pip"

```bash

# Create a virtual environment

python -m venv venv

source venv/bin/activate



# Install Pilz

pip install pilz

```

Step 3: Verify Installation¶

pilz --help

You should see:

Usage: pilz [OPTIONS] COMMAND [OPTIONS]

Commands:

  train      Train decision tree models

  eval       Evaluate trained models

  infer      Run inference only

  create-dc  Generate DataCard from CSV

Step 4: Download Sample Data¶

For quick testing, let's download the Iris dataset:

# Download Iris dataset

curl -o iris.csv "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

# Verify

head -5 iris.csv

Expected output:

5.1,3.5,1.4,0.2,Iris-setosa

4.9,3.0,1.4,0.2,Iris-setosa

4.7,3.2,1.3,0.2,Iris-setosa

4.6,3.1,1.5,0.2,Iris-setosa

5.0,3.6,1.4,0.2,Iris-setosa

!> The Iris dataset needs column headers. Add them:

echo "sepal_length,sepal_width,petal_length,petal_width,species" > iris_header.csv

cat iris.csv >> iris_header.csv

mv iris_header.csv iris.csv

Step 5: Create Your First DataCard¶

pilz create-dc --src iris.csv --out iris_dc.yaml

This creates iris_dc.yaml:

features:

  - name: sepal_length

    statistical: numerical

    type: float

  - name: sepal_width

    statistical: numerical

    type: float

  - name: petal_length

    statistical: numerical

    type: float

  - name: petal_width

    statistical: numerical

    type: float

target:

  feature_name: species

  values:

    - Iris-setosa

    - Iris-versicolor

    - Iris-virginica

train_files:

  - iris.csv

test_files:

  - iris.csv

Step 6: Create Training Settings¶

Create train_settings.yaml:

n: 1

out_folder: iris_model

max_depth: 10

n_dims: 2

n_cat: 3

Step 7: Train Your Model¶

pilz train --datacard iris_dc.yaml --trainsettings train_settings.yaml

Expected output:

INFO: Training for target Iris-setosa, tree 0

INFO: Training for target Iris-versicolor, tree 0

INFO: Training for target Iris-virginica, tree 0

INFO: Models saved to iris_model/

Step 8: Evaluate¶

Create eval_settings.yaml:

in_folders:

  - iris_model

out_folder: iris_eval

pilz eval --datacard iris_dc.yaml --evalsettings eval_settings.yaml

What Just Happened?¶

flowchart LR A[iris.csv] --> B[DataCard] B --> C[Train] C --> D[iris_model/] D --> E[Evaluate] E --> F[ROC Curves] E --> G[Predictions]

The trained model used Feature Categorization to bin features into n_cat categories, Multi-Dimensional Splits to find feature correlations, SQL-native queries for all data access, and Three-Way Splits to build the tree — all explained in Core Concepts.

Installation Summary¶

✅ Python 3.13+ installed
✅ Pilz installed via uv/pip
✅ Sample data downloaded
✅ DataCard created
✅ Model trained
✅ Results evaluated

Next Steps¶

How Pilz Works — Algorithm overview
Feature Categorization — How features are binned
Multi-Dimensional Splits — The core innovation
SQL-Native Architecture — How SQL powers everything
Three-Way Splits — Left / Neutral / Right branching
Downsampling — Per-node sampling strategy
Imbalanced Data — Handling skewed distributions
Iris Example — Quick walkthrough

Having issues? Check the Troubleshooting guide.