Cost Models#

Cost models define how segment homogeneity is measured. The choice of cost model determines what kind of distributional change the detector is sensitive to.

Overview#

Cost model

API string

Description

Multivariate

Best for

L2 (mean)

"l2"

Sum of squared residuals from segment mean

Yes (additive)

Mean shifts in continuous data

L1 (median)

"l1_median"

Sum of absolute deviations from segment median

Yes (additive)

Robust mean estimation with outliers

Normal

"normal"

Gaussian negative log-likelihood (diagonal)

Yes (additive)

Mean + variance changes

Normal Full Cov

"normal_full_cov"

Multivariate Gaussian with full covariance

Yes (cross-dim)

Covariance structure changes

NIG

"nig"

Normal-Inverse-Gamma marginal likelihood

Yes (additive)

Bayesian mean/variance inference

AR

"ar"

Autoregressive residual likelihood

Yes (additive)

Changes in autocorrelated data

Bernoulli

"bernoulli"

Bernoulli log-likelihood

Yes (additive)

Binary event rate changes

Poisson

"poisson"

Poisson rate log-likelihood

Yes (additive)

Count data rate changes

Rank

"rank"

Rank-based non-parametric cost

Yes (additive)

Distribution-free change detection

Cosine

"cosine"

Cosine similarity-based cost

Yes (additive)

Directional/angular changes

Per-model details#

L2 (CostL2Mean)#

The default and most widely used cost model. Measures the sum of squared deviations from the segment mean.

$$C(y_{a:b}) = \sum_{i=a}^{b-1} (y_i - \bar{y}_{a:b})^2$$

  • Multivariate: Sum of per-dimension SSE (independent dimensions)

  • BIC/AIC params: 2 per dimension (mean + residual variance proxy)

  • Cache scaling: O(n * d) memory

L1 Median (CostL1Median)#

Uses median instead of mean, providing robustness to outliers.

  • Best for: Data with occasional extreme values where L2 cost would be distorted

  • Multivariate: Sum of per-dimension absolute deviations

Normal (CostNormalMeanVar)#

Gaussian negative log-likelihood modeling both mean and variance per segment. Detects both mean shifts and variance changes.

  • Multivariate: Sum of per-dimension terms (diagonal covariance)

  • BIC/AIC params: 3 per dimension (mean + variance + residual)

  • Cache scaling: O(n * d) memory

  • Segment query: O(d) per query

Normal Full Covariance (CostNormalFullCov)#

Multivariate Gaussian with full covariance estimation per segment. Detects changes in cross-dimensional correlations.

  • Multivariate: Full covariance-aware (detects correlation structure changes)

  • BIC/AIC params: Model-aware 1 + d + d(d+1)/2

  • Cache scaling: O(n * d^2) memory

  • Segment query: O(d^2) covariance assembly + O(d^3) Cholesky

  • Regularization: Uses ridge regularization + jitter escalation in Cholesky for near-singular segments

Tip

Prefer normal (diagonal) when d is large, memory is constrained, or cross-dimension covariance is weak. Prefer normal_full_cov when covariance structure carries the change signal and d is moderate.

NIG (CostNIGMarginal)#

Normal-Inverse-Gamma marginal likelihood. A Bayesian cost that integrates out the mean and variance parameters.

  • Multivariate: Sum of per-dimension NIG marginal terms

AR (CostAR)#

Autoregressive residual likelihood for data with temporal correlation.

  • Best for: Time series where autocorrelation is the dominant feature

  • Cache scaling: O(n * d) for order 1; O(n * d) for higher orders

Bernoulli (CostBernoulli)#

Bernoulli log-likelihood for binary (0/1) event data.

  • Best for: Binary event rate changes (error rates, click-through rates)

Poisson (CostPoissonRate)#

Poisson rate log-likelihood for count data.

  • Best for: Count data rate changes (event counts per time period)

Rank (CostRank)#

Rank-based non-parametric cost function.

  • Best for: Distribution-free change detection where parametric assumptions are unwanted

Cosine (CostCosine)#

Cosine similarity-based cost for directional data.

  • Best for: Angular/directional changes in high-dimensional embeddings

Decision tree: choosing a cost model#

Is your data binary (0/1)?
  → Yes: Use "bernoulli"
Is your data count-valued (non-negative integers)?
  → Yes: Use "poisson"
Do you have outliers?
  → Yes: Use "l1_median" or "rank"
Do you want to detect variance changes (not just mean)?
  → Yes, univariate or independent dims: Use "normal"
  → Yes, cross-dimensional correlation: Use "normal_full_cov"
Is your data autocorrelated?
  → Yes: Use "ar"
Default:
  → Use "l2" (fastest, robust for mean shifts)

Availability in Python#

Cost model

High-level classes

detect_offline()

Pipeline-only

l2

Pelt, Binseg, Fpop

Yes

l1_median

Yes

normal

Pelt, Binseg

Yes

normal_full_cov

Pelt, Binseg

Yes

nig

Yes

ar

Yes

rank

Yes

cosine

Yes

bernoulli

Yes

poisson

Yes