Welcome to Pilz¶

Pilz is an experimental machine learning library for classification. Named after the German word for mushroom/fungus — organisms that are neither plant nor animal — Pilz explores a different approach to ML: SQL-native data processing, automatic downsampling for large-scale data, built-in handling of imbalanced datasets, direct n-dimensional correlation handling, and simplified feature selection without loss functions.

Why Pilz?¶

Pilz takes a different approach to machine learning:

SQL-native — All data processing runs through SQL, deployable directly in your database
Automatic downsampling — Works on very large datasets by intelligently sampling target and non-target rows
Imbalanced data out of the box — No SMOTE, class weights, or manual resampling needed
Simplified feature selection — Instead of optimizing a loss function, Pilz directly evaluates feature combinations by their discrimination power (target rate separation)
Multi-dimensional cuts — Captures feature correlations directly by splitting on multiple dimensions simultaneously
Three-way splits — Left, Neutral, and Right branches for handling ambiguous cases

Quick Example¶

# Train a model
pilz train --datacard mydata.yaml --trainsettings settings.yaml

# Get predictions as SQL
pilz eval --datacard mydata.yaml --evalsettings settings.yaml

The output includes ROC curves, accuracy metrics, and deployable SQL rules.

What You'll Learn¶

Getting Started - Install and run your first model in 5 minutes
Examples - Real-world examples with actual datasets
Core Concepts - How the algorithm actually works
Reference - Complete command and settings reference

Ready to Start?¶

→ Installation Guide

Project Notes¶

A big part of the motivation for this project was simply to explore and learn new tools. I got to work with Polars and DuckDB — two fantastic tools that I can highly recommend. Pydantic was a joy to use (maybe I got a little carried away with it). Typer made building the CLI effortless, and SymPy handled the symbolic math like a champ.

When I needed a logical minimizer and couldn't find one compatible with Python 3.13, I took it as an opportunity to build mi-amore — a port of the espresso algorithm modeled after the old pyeda package, making it available for modern Python.

The documentation you're reading now was 99.9% generated by OpenCode (without it, this would have taken forever). The unit tests were also written entirely by OpenCode — a huge time saver.

The code itself prioritizes clarity over speed — everything runs sequentially for now, so there's plenty of room for parallelization down the road.

Pilz — An experimental approach to classification