Skip to content

Three-Way Splits

Pilz splits each node into three branches: Left (mostly non-target), Neutral (unclear — continue splitting), and Right (mostly target).

The Three Branches

flowchart TD A[Data] --> B{Feature combination clear discrimination?} B -->|"Yes - Clear"| R[Right Branch: High target rate] B -->|"No - Unclear"| N[Neutral Branch: Continue splitting] B -->|"Yes - Clear"| L[Left Branch: Low target rate] R --> R2["Target Rate > 0.8"] N --> N2[Target Rate ~0.5] L --> L2["Target Rate < 0.2"] N2 --> R3[Split Again] N2 --> L3[Split Again] style N fill:#ffff99 style N2 fill:#ffff99

Each bin combination is classified based on its diff value — the difference between target rate and non-target rate: - diff > neutral_faktor → Right (target) - diff < -neutral_faktor → Left (non-target) - -neutral_faktor <= diff <= neutral_faktor → Neutral (uncertain)

Filter Generation

The get_left_right_filter() method classifies each combination and produces compact SQL conditions:

# src/pilz/model/dataframes.py:186-270
def get_left_right_filter(self):
    res_df = self.diff_df
    neutral_range = max(self.neutral_faktor, 0)

    target_lists = [
        res_df.filter(pl.col("diff") > neutral_range)[cut.feat_name].to_list()
        for cut in sorted_cut_list
    ]
    non_target_lists = [
        res_df.filter(pl.col("diff") <= -neutral_range)[cut.feat_name].to_list()
        for cut in sorted_cut_list
    ]

    # Classify every possible combination
    for possible_combs in itertools.product(*possible_labels):
        if possible_combs in target_list_tupels:
            target_cubes.append(bits)
        elif possible_combs in non_target_list_tupels or self.neutral_faktor < 0:
            non_target_cubes.append(bits)
        else:
            neutral_cubes.append(bits)

    # Boolean minimization for compact SQL
    min_target_cubes = minimize(cubes_on=target_cubes)
    min_neutral_cubes = minimize(cubes_on=neutral_cubes)
    min_non_target_cubes = minimize(cubes_on=non_target_cubes)

    return (
        self.get_sympy_combine_filter(min_cubes=min_non_target_cubes),   # Left
        self.get_sympy_combine_filter(min_cubes=min_neutral_cubes),     # Neutral
        self.get_sympy_combine_filter(min_cubes=min_target_cubes),      # Right
    )

The boolean minimization (using mi_amore) produces minimal SQL conditions instead of enumerating every winning combination.

The neutral_faktor Parameter

The neutral zone width is configurable:

# src/pilz/model/settings.py:41-51
neutral_faktor: float = Field(
    description="Threshold for the neutral zone. Bins with diff "
    "within [-neutral_faktor, neutral_faktor] go to Neutral.",
    default=0.0, ge=0.0, le=1.0,
)
  • neutral_faktor = 0.0 (default): Only bins with exactly equal target/non-target proportions go to Neutral
  • neutral_faktor > 0.0: Wider neutral zone — only clearly discriminating bins are classified as Left or Right

Recursive Tree Building

After determining the filters, Pilz recurses on each branch:

# src/pilz/service/train.py:100-151
def train_pilz(self, target_filter, path_filter, depth=""):
    train_df = self.darkwing.read_akt_train(
        targer_filter=target_filter,
        train_settings=self.settings,
        akt_filters=path_filter,
    )

    if train_df.is_final_size() or len(depth) >= self.settings.max_depth:
        return self.make_spore(path_filter=path_filter, depth=depth, train_df=train_df)

    self.cater(train_df=train_df)
    left_filter, neutral_filter, right_filter = self.counter(train_df=train_df)

    if left_filter is None and right_filter is None:
        return self.make_spore(path_filter=path_filter, depth=depth, train_df=train_df)

    left_spores = self.train_pilz(target_filter, path_filter + [left_filter], depth + "l") if left_filter else []
    neutral_spores = self.train_pilz(target_filter, path_filter + [neutral_filter], depth + "n") if neutral_filter else []
    right_spores = self.train_pilz(target_filter, path_filter + [right_filter], depth + "r") if right_filter else []

    return left_spores + neutral_spores + right_spores

The depth string encodes the path (l = left, n = neutral, r = right), allowing reconstruction of the decision path later.

Leaf Creation

Recursion stops when not enough samples remain or max depth is reached:

# src/pilz/model/dataframes.py:435-439
def is_final_size(self) -> bool:
    return (
        self.target_df_size < self.min_size
        or self.non_target_df_size < self.min_size
    )

# src/pilz/service/train.py:85-98
def make_spore(self, path_filter, depth, train_df):
    score = train_df.score()
    return [Spore(
        cut=[fil.sql() for fil in path_filter],
        score=score,
        depth=depth,
    )]

The score is the target rate at this leaf:

# src/pilz/model/dataframes.py:428-433
def score(self) -> float:
    if self.non_target_df_size + self.target_df_size == 0:
        return 0.0
    return (self.target_df_size - self.non_target_df_size) / (
        self.non_target_df_size + self.target_df_size
    )

A score of 1.0 means all rows are target, -1.0 means all are non-target, and 0.0 means a perfect balance.

Summary

Concept Description
Left branch diff < -neutral_faktor — mostly non-target
Neutral branch -neutral_faktor <= diff <= neutral_faktor — uncertain, continue splitting
Right branch diff > neutral_faktor — mostly target
Boolean minimization Produces compact SQL conditions
Recursion Continues splitting Neutral branches
Leaf creation Stops when data is too small or depth is too deep

Next Steps