Doctor Recommendation Engine#

The doctor analyzes your signal’s statistical properties and recommends ranked detector/cost/stopping pipelines with calibrated confidence scores.

What the doctor does#

  1. Diagnose – Computes signal statistics (distribution shape, autocorrelation, seasonality, missing data patterns, dimensionality)

  2. Classify – Maps the signal to one or more calibration families

  3. Recommend – Generates ranked pipeline recommendations scored by confidence and objective fit

  4. Execute – Recommendations can be directly executed via detect_offline(pipeline=...)

CLI workflow#

cpd doctor \
    --input /path/to/signal.csv \
    --objective balanced \
    --min-confidence 0.2 \
    --output doctor.json

The output is a JSON file with ranked recommendations, each containing:

  • Pipeline specification (detector, cost, stopping, constraints, preprocessing)

  • Confidence score and confidence interval

  • Resource estimates

  • Explanation and warnings

  • Objective fit scores

Python integration#

Execute a doctor recommendation directly:

import cpd
import json

# Load doctor output
with open("doctor.json") as f:
    recommendations = json.load(f)

# Use the top recommendation's pipeline
pipeline = recommendations[0]["pipeline"]
result = cpd.detect_offline(x, pipeline=pipeline)
print(result.breakpoints)

Objectives#

The objective parameter controls the tradeoff between speed, accuracy, and robustness in pipeline ranking:

Objective

Description

Balanced

Default. Balances accuracy, speed, and generality

Speed

Favors fast algorithms (PELT, CUSUM) with simpler cost models

Accuracy

Favors algorithms with stronger optimality guarantees (FPOP, SegNeigh)

Robustness

Favors non-parametric or masking-resistant approaches (WBS, Rank cost)

Calibration families#

The doctor classifies signals into families for calibration-aware scoring:

Family

Characteristics

Gaussian

Near-normal distribution, light tails

HeavyTailed

Excess kurtosis, outlier-prone

Autocorrelated

Significant temporal dependence

Seasonal

Periodic patterns detected

Multivariate

d > 1 dimensions

Binary

Values near 0 or 1 (within tolerance)

Count

Non-negative integer-valued data

Confidence formula#

Each recommendation includes a calibrated confidence score:

confidence = clamp(
    (intercept + slope * heuristic_confidence) * (1 - ood_penalty),
    0.01,
    0.99
)

Where:

  • intercept and slope are per-family calibration parameters

  • heuristic_confidence is the raw score from pipeline-data compatibility analysis

  • ood_penalty = clamp(1 - exp(-0.90 * diagnostic_divergence), 0.0, 0.80) penalizes out-of-distribution signals

  • Final confidence is clamped to [0.01, 0.99]

Preprocessing recommendations#

The doctor also recommends preprocessing based on signal diagnostics:

Signal property

Recommended preprocessing

Linear or polynomial trend

detrend

Seasonal pattern detected

deseasonalize

High outlier rate

winsorize

Scale instability across segments

robust_scale

Worked example#

Consider a seasonal signal with a trend and a change in mean at index 500:

import numpy as np
import cpd

# Seasonal + trend + change point
t = np.arange(1000, dtype=np.float64)
seasonal = 2.0 * np.sin(2 * np.pi * t / 50)
trend = 0.005 * t
shift = np.where(t >= 500, 3.0, 0.0)
noise = np.random.default_rng(42).normal(0, 0.5, 1000)
signal = seasonal + trend + shift + noise

# Doctor would recommend preprocessing + PELT
# After running doctor CLI or using the recommendation:
result = cpd.detect_offline(
    signal,
    detector="pelt",
    cost="l2",
    constraints={"min_segment_len": 10},
    stopping={"pen": "bic"},
    preprocess={
        "detrend": {"method": "linear"},
        "deseasonalize": {"method": "stl_like", "period": 50},
    },
)

print("Change points:", result.change_points)
# Expected: change point near index 500

Multivariate awareness#

  • Offline: Doctor emits multivariate-specific guidance for cost model selection (diagonal vs full covariance tradeoffs)

  • Online: Doctor rejects multivariate inputs (d > 1) with a clear guidance error, as online detectors currently support only univariate data