Doctor Recommendation Engine#
The doctor analyzes your signal’s statistical properties and recommends ranked detector/cost/stopping pipelines with calibrated confidence scores.
What the doctor does#
Diagnose – Computes signal statistics (distribution shape, autocorrelation, seasonality, missing data patterns, dimensionality)
Classify – Maps the signal to one or more calibration families
Recommend – Generates ranked pipeline recommendations scored by confidence and objective fit
Execute – Recommendations can be directly executed via
detect_offline(pipeline=...)
CLI workflow#
cpd doctor \
--input /path/to/signal.csv \
--objective balanced \
--min-confidence 0.2 \
--output doctor.json
The output is a JSON file with ranked recommendations, each containing:
Pipeline specification (detector, cost, stopping, constraints, preprocessing)
Confidence score and confidence interval
Resource estimates
Explanation and warnings
Objective fit scores
Python integration#
Execute a doctor recommendation directly:
import cpd
import json
# Load doctor output
with open("doctor.json") as f:
recommendations = json.load(f)
# Use the top recommendation's pipeline
pipeline = recommendations[0]["pipeline"]
result = cpd.detect_offline(x, pipeline=pipeline)
print(result.breakpoints)
Objectives#
The objective parameter controls the tradeoff between speed, accuracy, and robustness in pipeline ranking:
Objective |
Description |
|---|---|
|
Default. Balances accuracy, speed, and generality |
|
Favors fast algorithms (PELT, CUSUM) with simpler cost models |
|
Favors algorithms with stronger optimality guarantees (FPOP, SegNeigh) |
|
Favors non-parametric or masking-resistant approaches (WBS, Rank cost) |
Calibration families#
The doctor classifies signals into families for calibration-aware scoring:
Family |
Characteristics |
|---|---|
|
Near-normal distribution, light tails |
|
Excess kurtosis, outlier-prone |
|
Significant temporal dependence |
|
Periodic patterns detected |
|
d > 1 dimensions |
|
Values near 0 or 1 (within tolerance) |
|
Non-negative integer-valued data |
Confidence formula#
Each recommendation includes a calibrated confidence score:
confidence = clamp(
(intercept + slope * heuristic_confidence) * (1 - ood_penalty),
0.01,
0.99
)
Where:
interceptandslopeare per-family calibration parametersheuristic_confidenceis the raw score from pipeline-data compatibility analysisood_penalty = clamp(1 - exp(-0.90 * diagnostic_divergence), 0.0, 0.80)penalizes out-of-distribution signalsFinal confidence is clamped to [0.01, 0.99]
Preprocessing recommendations#
The doctor also recommends preprocessing based on signal diagnostics:
Signal property |
Recommended preprocessing |
|---|---|
Linear or polynomial trend |
|
Seasonal pattern detected |
|
High outlier rate |
|
Scale instability across segments |
|
Worked example#
Consider a seasonal signal with a trend and a change in mean at index 500:
import numpy as np
import cpd
# Seasonal + trend + change point
t = np.arange(1000, dtype=np.float64)
seasonal = 2.0 * np.sin(2 * np.pi * t / 50)
trend = 0.005 * t
shift = np.where(t >= 500, 3.0, 0.0)
noise = np.random.default_rng(42).normal(0, 0.5, 1000)
signal = seasonal + trend + shift + noise
# Doctor would recommend preprocessing + PELT
# After running doctor CLI or using the recommendation:
result = cpd.detect_offline(
signal,
detector="pelt",
cost="l2",
constraints={"min_segment_len": 10},
stopping={"pen": "bic"},
preprocess={
"detrend": {"method": "linear"},
"deseasonalize": {"method": "stl_like", "period": 50},
},
)
print("Change points:", result.change_points)
# Expected: change point near index 500
Multivariate awareness#
Offline: Doctor emits multivariate-specific guidance for cost model selection (diagonal vs full covariance tradeoffs)
Online: Doctor rejects multivariate inputs (d > 1) with a clear guidance error, as online detectors currently support only univariate data