Doctor Recommendation Engine#

The doctor analyzes your signal’s statistical properties and recommends ranked detector/cost/stopping pipelines with calibrated confidence scores.

What the doctor does#

Diagnose – Computes signal statistics (distribution shape, autocorrelation, seasonality, missing data patterns, dimensionality)
Classify – Maps the signal to one or more calibration families
Recommend – Generates ranked pipeline recommendations scored by confidence and objective fit
Execute – Recommendations can be directly executed via detect_offline(pipeline=...)

CLI workflow#

cpd doctor \
    --input /path/to/signal.csv \
    --objective balanced \
    --min-confidence 0.2 \
    --output doctor.json

The output is a JSON file with ranked recommendations, each containing:

Pipeline specification (detector, cost, stopping, constraints, preprocessing)
Confidence score and confidence interval
Resource estimates
Explanation and warnings
Objective fit scores

Python integration#

Execute a doctor recommendation directly:

import cpd
import json

# Load doctor output
with open("doctor.json") as f:
    recommendations = json.load(f)

# Use the top recommendation's pipeline
pipeline = recommendations[0]["pipeline"]
result = cpd.detect_offline(x, pipeline=pipeline)
print(result.breakpoints)

Objectives#

The objective parameter controls the tradeoff between speed, accuracy, and robustness in pipeline ranking:

Objective	Description
`Balanced`	Default. Balances accuracy, speed, and generality
`Speed`	Favors fast algorithms (PELT, CUSUM) with simpler cost models
`Accuracy`	Favors algorithms with stronger optimality guarantees (FPOP, SegNeigh)
`Robustness`	Favors non-parametric or masking-resistant approaches (WBS, Rank cost)

Calibration families#

The doctor classifies signals into families for calibration-aware scoring:

Family	Characteristics
`Gaussian`	Near-normal distribution, light tails
`HeavyTailed`	Excess kurtosis, outlier-prone
`Autocorrelated`	Significant temporal dependence
`Seasonal`	Periodic patterns detected
`Multivariate`	d > 1 dimensions
`Binary`	Values near 0 or 1 (within tolerance)
`Count`	Non-negative integer-valued data

Confidence formula#

Each recommendation includes a calibrated confidence score:

confidence = clamp(
    (intercept + slope * heuristic_confidence) * (1 - ood_penalty),
    0.01,
    0.99
)

Where:

intercept and slope are per-family calibration parameters
heuristic_confidence is the raw score from pipeline-data compatibility analysis
ood_penalty = clamp(1 - exp(-0.90 * diagnostic_divergence), 0.0, 0.80) penalizes out-of-distribution signals
Final confidence is clamped to [0.01, 0.99]

Preprocessing recommendations#

The doctor also recommends preprocessing based on signal diagnostics:

Signal property	Recommended preprocessing
Linear or polynomial trend	`detrend`
Seasonal pattern detected	`deseasonalize`
High outlier rate	`winsorize`
Scale instability across segments	`robust_scale`

Worked example#

Consider a seasonal signal with a trend and a change in mean at index 500:

import numpy as np
import cpd

# Seasonal + trend + change point
t = np.arange(1000, dtype=np.float64)
seasonal = 2.0 * np.sin(2 * np.pi * t / 50)
trend = 0.005 * t
shift = np.where(t >= 500, 3.0, 0.0)
noise = np.random.default_rng(42).normal(0, 0.5, 1000)
signal = seasonal + trend + shift + noise

# Doctor would recommend preprocessing + PELT
# After running doctor CLI or using the recommendation:
result = cpd.detect_offline(
    signal,
    detector="pelt",
    cost="l2",
    constraints={"min_segment_len": 10},
    stopping={"pen": "bic"},
    preprocess={
        "detrend": {"method": "linear"},
        "deseasonalize": {"method": "stl_like", "period": 50},
    },
)

print("Change points:", result.change_points)
# Expected: change point near index 500

Multivariate awareness#

Offline: Doctor emits multivariate-specific guidance for cost model selection (diagonal vs full covariance tradeoffs)
Online: Doctor rejects multivariate inputs (d > 1) with a clear guidance error, as online detectors currently support only univariate data