Three-Way Splits¶
Pilz splits each node into three branches: Left (mostly non-target), Neutral (unclear — continue splitting), and Right (mostly target).
The Three Branches¶
Each bin combination is classified based on its diff value — the difference between target rate and non-target rate:
- diff > neutral_faktor → Right (target)
- diff < -neutral_faktor → Left (non-target)
- -neutral_faktor <= diff <= neutral_faktor → Neutral (uncertain)
Filter Generation¶
The get_left_right_filter() method classifies each combination and produces compact SQL conditions:
# src/pilz/model/dataframes.py:186-270
def get_left_right_filter(self):
res_df = self.diff_df
neutral_range = max(self.neutral_faktor, 0)
target_lists = [
res_df.filter(pl.col("diff") > neutral_range)[cut.feat_name].to_list()
for cut in sorted_cut_list
]
non_target_lists = [
res_df.filter(pl.col("diff") <= -neutral_range)[cut.feat_name].to_list()
for cut in sorted_cut_list
]
# Classify every possible combination
for possible_combs in itertools.product(*possible_labels):
if possible_combs in target_list_tupels:
target_cubes.append(bits)
elif possible_combs in non_target_list_tupels or self.neutral_faktor < 0:
non_target_cubes.append(bits)
else:
neutral_cubes.append(bits)
# Boolean minimization for compact SQL
min_target_cubes = minimize(cubes_on=target_cubes)
min_neutral_cubes = minimize(cubes_on=neutral_cubes)
min_non_target_cubes = minimize(cubes_on=non_target_cubes)
return (
self.get_sympy_combine_filter(min_cubes=min_non_target_cubes), # Left
self.get_sympy_combine_filter(min_cubes=min_neutral_cubes), # Neutral
self.get_sympy_combine_filter(min_cubes=min_target_cubes), # Right
)
The boolean minimization (using mi_amore) produces minimal SQL conditions instead of enumerating every winning combination.
The neutral_faktor Parameter¶
The neutral zone width is configurable:
# src/pilz/model/settings.py:41-51
neutral_faktor: float = Field(
description="Threshold for the neutral zone. Bins with diff "
"within [-neutral_faktor, neutral_faktor] go to Neutral.",
default=0.0, ge=0.0, le=1.0,
)
- neutral_faktor = 0.0 (default): Only bins with exactly equal target/non-target proportions go to Neutral
- neutral_faktor > 0.0: Wider neutral zone — only clearly discriminating bins are classified as Left or Right
Recursive Tree Building¶
After determining the filters, Pilz recurses on each branch:
# src/pilz/service/train.py:100-151
def train_pilz(self, target_filter, path_filter, depth=""):
train_df = self.darkwing.read_akt_train(
targer_filter=target_filter,
train_settings=self.settings,
akt_filters=path_filter,
)
if train_df.is_final_size() or len(depth) >= self.settings.max_depth:
return self.make_spore(path_filter=path_filter, depth=depth, train_df=train_df)
self.cater(train_df=train_df)
left_filter, neutral_filter, right_filter = self.counter(train_df=train_df)
if left_filter is None and right_filter is None:
return self.make_spore(path_filter=path_filter, depth=depth, train_df=train_df)
left_spores = self.train_pilz(target_filter, path_filter + [left_filter], depth + "l") if left_filter else []
neutral_spores = self.train_pilz(target_filter, path_filter + [neutral_filter], depth + "n") if neutral_filter else []
right_spores = self.train_pilz(target_filter, path_filter + [right_filter], depth + "r") if right_filter else []
return left_spores + neutral_spores + right_spores
The depth string encodes the path (l = left, n = neutral, r = right), allowing reconstruction of the decision path later.
Leaf Creation¶
Recursion stops when not enough samples remain or max depth is reached:
# src/pilz/model/dataframes.py:435-439
def is_final_size(self) -> bool:
return (
self.target_df_size < self.min_size
or self.non_target_df_size < self.min_size
)
# src/pilz/service/train.py:85-98
def make_spore(self, path_filter, depth, train_df):
score = train_df.score()
return [Spore(
cut=[fil.sql() for fil in path_filter],
score=score,
depth=depth,
)]
The score is the target rate at this leaf:
# src/pilz/model/dataframes.py:428-433
def score(self) -> float:
if self.non_target_df_size + self.target_df_size == 0:
return 0.0
return (self.target_df_size - self.non_target_df_size) / (
self.non_target_df_size + self.target_df_size
)
A score of 1.0 means all rows are target, -1.0 means all are non-target, and 0.0 means a perfect balance.
Summary¶
| Concept | Description |
|---|---|
| Left branch | diff < -neutral_faktor — mostly non-target |
| Neutral branch | -neutral_faktor <= diff <= neutral_faktor — uncertain, continue splitting |
| Right branch | diff > neutral_faktor — mostly target |
| Boolean minimization | Produces compact SQL conditions |
| Recursion | Continues splitting Neutral branches |
| Leaf creation | Stops when data is too small or depth is too deep |
Next Steps¶
- Downsampling — How training data is sampled at each node
- Imbalanced Data — How Pilz handles skewed distributions
- Training Internals — Full algorithm