Cost Models#
Cost models define how segment homogeneity is measured. The choice of cost model determines what kind of distributional change the detector is sensitive to.
Overview#
Cost model |
API string |
Description |
Multivariate |
Best for |
|---|---|---|---|---|
L2 (mean) |
|
Sum of squared residuals from segment mean |
Yes (additive) |
Mean shifts in continuous data |
L1 (median) |
|
Sum of absolute deviations from segment median |
Yes (additive) |
Robust mean estimation with outliers |
Normal |
|
Gaussian negative log-likelihood (diagonal) |
Yes (additive) |
Mean + variance changes |
Normal Full Cov |
|
Multivariate Gaussian with full covariance |
Yes (cross-dim) |
Covariance structure changes |
NIG |
|
Normal-Inverse-Gamma marginal likelihood |
Yes (additive) |
Bayesian mean/variance inference |
AR |
|
Autoregressive residual likelihood |
Yes (additive) |
Changes in autocorrelated data |
Bernoulli |
|
Bernoulli log-likelihood |
Yes (additive) |
Binary event rate changes |
Poisson |
|
Poisson rate log-likelihood |
Yes (additive) |
Count data rate changes |
Rank |
|
Rank-based non-parametric cost |
Yes (additive) |
Distribution-free change detection |
Cosine |
|
Cosine similarity-based cost |
Yes (additive) |
Directional/angular changes |
Per-model details#
L2 (CostL2Mean)#
The default and most widely used cost model. Measures the sum of squared deviations from the segment mean.
$$C(y_{a:b}) = \sum_{i=a}^{b-1} (y_i - \bar{y}_{a:b})^2$$
Multivariate: Sum of per-dimension SSE (independent dimensions)
BIC/AIC params: 2 per dimension (mean + residual variance proxy)
Cache scaling: O(n * d) memory
L1 Median (CostL1Median)#
Uses median instead of mean, providing robustness to outliers.
Best for: Data with occasional extreme values where L2 cost would be distorted
Multivariate: Sum of per-dimension absolute deviations
Normal (CostNormalMeanVar)#
Gaussian negative log-likelihood modeling both mean and variance per segment. Detects both mean shifts and variance changes.
Multivariate: Sum of per-dimension terms (diagonal covariance)
BIC/AIC params: 3 per dimension (mean + variance + residual)
Cache scaling: O(n * d) memory
Segment query: O(d) per query
Normal Full Covariance (CostNormalFullCov)#
Multivariate Gaussian with full covariance estimation per segment. Detects changes in cross-dimensional correlations.
Multivariate: Full covariance-aware (detects correlation structure changes)
BIC/AIC params: Model-aware
1 + d + d(d+1)/2Cache scaling: O(n * d^2) memory
Segment query: O(d^2) covariance assembly + O(d^3) Cholesky
Regularization: Uses ridge regularization + jitter escalation in Cholesky for near-singular segments
Tip
Prefer normal (diagonal) when d is large, memory is constrained, or cross-dimension covariance is weak. Prefer normal_full_cov when covariance structure carries the change signal and d is moderate.
NIG (CostNIGMarginal)#
Normal-Inverse-Gamma marginal likelihood. A Bayesian cost that integrates out the mean and variance parameters.
Multivariate: Sum of per-dimension NIG marginal terms
AR (CostAR)#
Autoregressive residual likelihood for data with temporal correlation.
Best for: Time series where autocorrelation is the dominant feature
Cache scaling: O(n * d) for order 1; O(n * d) for higher orders
Bernoulli (CostBernoulli)#
Bernoulli log-likelihood for binary (0/1) event data.
Best for: Binary event rate changes (error rates, click-through rates)
Poisson (CostPoissonRate)#
Poisson rate log-likelihood for count data.
Best for: Count data rate changes (event counts per time period)
Rank (CostRank)#
Rank-based non-parametric cost function.
Best for: Distribution-free change detection where parametric assumptions are unwanted
Cosine (CostCosine)#
Cosine similarity-based cost for directional data.
Best for: Angular/directional changes in high-dimensional embeddings
Decision tree: choosing a cost model#
Is your data binary (0/1)?
→ Yes: Use "bernoulli"
Is your data count-valued (non-negative integers)?
→ Yes: Use "poisson"
Do you have outliers?
→ Yes: Use "l1_median" or "rank"
Do you want to detect variance changes (not just mean)?
→ Yes, univariate or independent dims: Use "normal"
→ Yes, cross-dimensional correlation: Use "normal_full_cov"
Is your data autocorrelated?
→ Yes: Use "ar"
Default:
→ Use "l2" (fastest, robust for mean shifts)
Availability in Python#
Cost model |
High-level classes |
|
Pipeline-only |
|---|---|---|---|
|
|
Yes |
– |
|
– |
Yes |
– |
|
|
Yes |
– |
|
|
Yes |
– |
|
– |
– |
Yes |
|
– |
– |
Yes |
|
– |
– |
Yes |
|
– |
– |
Yes |
|
– |
– |
Yes |
|
– |
– |
Yes |