Multi-Dimensional Splits¶

The n_dims parameter controls how Pilz finds correlations between features. This is the core innovation that makes Pilz special — it can split on multiple features simultaneously rather than one at a time.

What is n_dims?¶

n_dims defines how many features are combined in a single split evaluation:

flowchart TB subgraph n_dims_1 D1["n_dims=1"] --> F1[X alone] end subgraph n_dims_2 D2["n_dims=2"] --> P1[X AND Y pairs] end subgraph n_dims_3 D3["n_dims=3"] --> T1[X AND Y AND Z triplets] end

n_dims	Combinations Evaluated
1	Single features only
2	Feature pairs
3	Feature triplets

Why Multi-Dimensional Matters¶

The Correlation Problem¶

When features correlate, their combination is more predictive than either alone:

flowchart LR subgraph "Individual Features" I1["Contract=Monthly churn=42%"] --> R1[Not enough] I2["TechSupport=No churn=38%"] --> R2[Not enough] end subgraph "Combined (Correlation)" C["Contract=Monthly AND TechSupport=No churn=85%"] --> R3[Strong signal] end I1 --> C I2 --> C style C fill:#ccffcc style R3 fill:#ccffcc

Traditional vs Pilz¶

A traditional decision tree needs multiple sequential splits to capture a pairwise correlation. Pilz captures it in a single multi-dimensional cut:

flowchart TD subgraph "Traditional Tree - Needs Multiple Splits" T1[Data] --> T2{Contract=Monthly?} T2 -->|Yes| T3{No TechSupport?} T3 -->|Yes| T4[High Churn] T3 -->|No| T5[Medium Churn] T2 -->|No| T6[Low Churn] end subgraph "Pilz - Single Multi-Dimensional Cut" P1[Data] --> P2{Contract=Monthly AND TechSupport=No?} P2 -->|Yes| P3[High Churn, Score: 0.85] P2 -->|No| P4[Low Churn, Score: 0.15] end style P2 fill:#ccffcc

How It Works¶

The counter() method finds the best split by evaluating single features and multi-dimensional combinations:

# src/pilz/service/train.py:153-190
def counter(self, train_df: TrainDataframes) -> tuple[Filter, Filter | None, Filter]:
    # Step 1: Score individual features (n_dims=1)
    sorted_train_feats = sorted(
        train_df.train_features,
        key=lambda x: x.calc_diff(),
        reverse=True,
    )
    best_feat = sorted_train_feats[0]
    best_feat_diff = best_feat.calc_diff()

    # Step 2: Try multi-feature combinations for higher n_dims
    for dim in range(2, self.settings.n_dims + 1):
        counter = 0
        for comb in itertools.combinations(sorted_train_feats, r=dim):
            counter += 1

            akt_feature = CombinedCategorizedFeature(
                comb,
                non_target_size=train_df.n_count_non_target,
                target_size=train_df.n_count_target,
                neutral_faktor=self.settings.neutral_faktor,
            )

            if akt_feature.calc_diff() > best_feat_diff:
                best_feat = akt_feature
                best_feat_diff = akt_feature.calc_diff()

            if (
                self.settings.calcs_per_dim
                and counter > self.settings.calcs_per_dim
            ):
                break

    return best_feat.get_left_right_filter()

Step 1: Score Individual Features¶

Each feature is scored by calc_diff():

# src/pilz/model/dataframes.py:153-167
def calc_diff(self) -> float:
    target_diff = self.diff_df.filter(
        pl.col("diff") > self.neutral_faktor
    )["diff"].sum()
    non_target_diff = abs(
        self.diff_df.filter(pl.col("diff") < -self.neutral_faktor)["diff"].sum()
    )
    return max(target_diff, non_target_diff)

Feature Sorting¶

Before building combinations, counter() sorts all features by their calc_diff() score descending. The single best feature becomes sorted_train_feats[0]:

# src/pilz/service/train.py:156-161
sorted_train_feats = sorted(
    train_df.train_features,
    key=lambda x: x.calc_diff(),
    reverse=True,
)
best_feat = sorted_train_feats[0]

This ordering determines combination priority. itertools.combinations generates combinations in the order of the sorted list, so the strongest features always appear first:

graph LR F1["F1 (best)"] --> F2["F2"] --> F3["F3"] --> FD["..."] --> FN["FN (weakest)"] F1 --> C1["(F1,F2)"] F1 --> C2["(F1,F3)"] F1 --> C3["(F1,F4)"] F2 --> C4["(F2,F3)"]

When calcs_per_dim limits the number of combinations, pruning naturally affects the weaker feature combinations at the end of the iteration order. Key implications:

The best feature (sorted_train_feats[0]) appears in the most combinations, giving it maximum coverage
Weaker features may never be evaluated when calcs_per_dim cuts early
Combination order is deterministic — always follows the sorted calc_diff() order within each counter() call

Step 2: Try Feature Combinations¶

For each dimension from 2 to n_dims, Pilz generates all combinations using itertools.combinations and wraps them in CombinedCategorizedFeature. Since features are pre-sorted by calc_diff(), combinations follow the same priority — (best, second_best) is evaluated before (best, weakest):

# src/pilz/model/dataframes.py:334-362
class CombinedCategorizedFeature(CategorizedFeatureMixin):
    def __init__(self, train_features, non_target_size, target_size, neutral_faktor):
        group_by = [train.feature.name for train in train_features]

        # Build joint contingency table
        non_target_df = pl.DataFrame([train.non_target_sr for train in train_features])
        df_count_non_target = (
            non_target_df.group_by(group_by)
            .len(name="proportion")
            .with_columns((pl.col("proportion") / non_target_size))
        )

        target_df = pl.DataFrame([train.target_sr for train in train_features])
        df_count_target = (
            target_df.group_by(group_by)
            .len(name="proportion")
            .with_columns((pl.col("proportion") / target_size))
        )

        # Compute diff for each combination value
        self.set_diff_df(
            df_count_target=df_count_target,
            df_count_non_target=df_count_non_target,
            join_on=group_by,
        )

The set_diff_df() method computes the difference between target and non-target proportions:

# src/pilz/model/dataframes.py:169-184
def set_diff_df(self, df_count_target, df_count_non_target, join_on):
    self.diff_df = (
        df_count_non_target.join(df_count_target, on=join_on, how="outer")
        .fill_null(0)
        .with_columns(
            (pl.col("proportion_right") - pl.col("proportion")).alias("diff"),
            pl.max_horizontal(
                pl.col("proportion"), pl.col("proportion_right")
            ).alias("max_proportion"),
        )
    )

Step 3: Determine Split¶

The get_left_right_filter() method classifies each combination based on diff:

The Combination Explosion¶

Higher n_dims values evaluate exponentially more combinations:

flowchart TB subgraph "Combination Explosion" A[4 features] --> B["n_dims=1: 4"] A --> C["n_dims=2: 6"] A --> D["n_dims=3: 4"] A --> E["n_dims=4: 1"] end A --> F[20 features] F --> G["n_dims=1: 20"] F --> H["n_dims=2: 190"] F --> I["n_dims=3: 1140"] F --> J["n_dims=4: 4845"]

Features	n_dims=1	n_dims=2	n_dims=3	n_dims=4
4	4	6	4	1
10	10	45	120	210
20	20	190	1,140	4,845
50	50	1,225	19,600	230,300

The calcs_per_dim Parameter¶

To keep training time bounded, calcs_per_dim limits how many combinations are tried per dimension:

# src/pilz/model/settings.py:32-35
calcs_per_dim: int | None = Field(
    description="Maximum calculations per dimension",
    default=5000,
)

flowchart LR C[Start] --> L{counter < calcs_per_dim?} L -->|Yes| E[Evaluate next combination] L -->|No| S[Stop early] style L fill:#e0f0ff style S fill:#ffff99

Practical Guidelines¶

When to Use Higher n_dims¶

n_dims	Best For
1	Simple datasets, many features, baseline
2	Most cases — captures pairwise correlations
3	Complex interactions, fewer features

Recommendations¶

flowchart TD START[Start] --> Q1{Feature correlations?} Q1 -->|Yes| Q2{How many features?} Q1 -->|No| A1["n_dims=1"] Q2 -->|< 20| A2["n_dims=2"] Q2 -->|20-50| A3["n_dims=3"] Q2 -->|50+| A4["n_dims=2, then experiment"] style A2 fill:#ccffcc style A1 fill:#ffff99 style A3 fill:#ffff99 style A4 fill:#ffff99

Summary¶

Concept	Description
n_dims=1	Single feature splits — fast, no correlation capture
n_dims=2	Feature pairs — captures pairwise correlations
n_dims=3+	Higher-order combinations — for complex interactions
calcs_per_dim	Limits computation to prevent exhaustive search
`counter()` at `train.py:153`	Main split-finding method
Feature sorting at `train.py:156-160`	Features sorted by `calc_diff()` descending; strongest prioritized in combinations
Combination order	`itertools.combinations` follows sorted order; `calcs_per_dim` prunes weaker combos last
`CombinedCategorizedFeature` at `dataframes.py:334`	Builds joint contingency tables

Next Steps¶

Three-Way Splits — How the split is used in tree building
Feature Categorization — How features are binned first
Training Internals — Full algorithm reference