Skip to content

Welcome to Pilz

Pilz is an experimental machine learning library for classification. Named after the German word for mushroom/fungus — organisms that are neither plant nor animal — Pilz explores a different approach to ML: SQL-native data processing, automatic downsampling for large-scale data, built-in handling of imbalanced datasets, direct n-dimensional correlation handling, and simplified feature selection without loss functions.

Why Pilz?

Pilz takes a different approach to machine learning:

  • SQL-native — All data processing runs through SQL, deployable directly in your database
  • Automatic downsampling — Works on very large datasets by intelligently sampling target and non-target rows
  • Imbalanced data out of the box — No SMOTE, class weights, or manual resampling needed
  • Simplified feature selection — Instead of optimizing a loss function, Pilz directly evaluates feature combinations by their discrimination power (target rate separation)
  • Multi-dimensional cuts — Captures feature correlations directly by splitting on multiple dimensions simultaneously
  • Three-way splits — Left, Neutral, and Right branches for handling ambiguous cases

Quick Example

# Train a model
pilz train --datacard mydata.yaml --trainsettings settings.yaml

# Get predictions as SQL
pilz eval --datacard mydata.yaml --evalsettings settings.yaml

The output includes ROC curves, accuracy metrics, and deployable SQL rules.

What You'll Learn

  1. Getting Started - Install and run your first model in 5 minutes
  2. Examples - Real-world examples with actual datasets
  3. Core Concepts - How the algorithm actually works
  4. Reference - Complete command and settings reference

Ready to Start?

Installation Guide

Project Notes

A big part of the motivation for this project was simply to explore and learn new tools. I got to work with Polars and DuckDB — two fantastic tools that I can highly recommend. Pydantic was a joy to use (maybe I got a little carried away with it). Typer made building the CLI effortless, and SymPy handled the symbolic math like a champ.

When I needed a logical minimizer and couldn't find one compatible with Python 3.13, I took it as an opportunity to build mi-amore — a port of the espresso algorithm modeled after the old pyeda package, making it available for modern Python.

The documentation you're reading now was 99.9% generated by OpenCode (without it, this would have taken forever). The unit tests were also written entirely by OpenCode — a huge time saver.

The code itself prioritizes clarity over speed — everything runs sequentially for now, so there's plenty of room for parallelization down the road.


Pilz — An experimental approach to classification