Welcome to Pilz¶
Pilz is an experimental machine learning library for classification. Named after the German word for mushroom/fungus — organisms that are neither plant nor animal — Pilz explores a different approach to ML: SQL-native data processing, automatic downsampling for large-scale data, built-in handling of imbalanced datasets, direct n-dimensional correlation handling, and simplified feature selection without loss functions.
Why Pilz?¶
Pilz takes a different approach to machine learning:
- SQL-native — All data processing runs through SQL, deployable directly in your database
- Automatic downsampling — Works on very large datasets by intelligently sampling target and non-target rows
- Imbalanced data out of the box — No SMOTE, class weights, or manual resampling needed
- Simplified feature selection — Instead of optimizing a loss function, Pilz directly evaluates feature combinations by their discrimination power (target rate separation)
- Multi-dimensional cuts — Captures feature correlations directly by splitting on multiple dimensions simultaneously
- Three-way splits — Left, Neutral, and Right branches for handling ambiguous cases
Quick Example¶
# Train a model
pilz train --datacard mydata.yaml --trainsettings settings.yaml
# Get predictions as SQL
pilz eval --datacard mydata.yaml --evalsettings settings.yaml
The output includes ROC curves, accuracy metrics, and deployable SQL rules.
What You'll Learn¶
- Getting Started - Install and run your first model in 5 minutes
- Examples - Real-world examples with actual datasets
- Core Concepts - How the algorithm actually works
- Reference - Complete command and settings reference
Ready to Start?¶
Project Notes¶
A big part of the motivation for this project was simply to explore and learn new tools. I got to work with Polars and DuckDB — two fantastic tools that I can highly recommend. Pydantic was a joy to use (maybe I got a little carried away with it). Typer made building the CLI effortless, and SymPy handled the symbolic math like a champ.
When I needed a logical minimizer and couldn't find one compatible with Python 3.13, I took it as an opportunity to build mi-amore — a port of the espresso algorithm modeled after the old pyeda package, making it available for modern Python.
The documentation you're reading now was 99.9% generated by OpenCode (without it, this would have taken forever). The unit tests were also written entirely by OpenCode — a huge time saver.
The code itself prioritizes clarity over speed — everything runs sequentially for now, so there's plenty of room for parallelization down the road.
Pilz — An experimental approach to classification