# Doctor Recommendation Engine

The **doctor** analyzes your signal's statistical properties and recommends ranked detector/cost/stopping pipelines with calibrated confidence scores.

## What the doctor does

1. **Diagnose** -- Computes signal statistics (distribution shape, autocorrelation, seasonality, missing data patterns, dimensionality)
2. **Classify** -- Maps the signal to one or more calibration families
3. **Recommend** -- Generates ranked pipeline recommendations scored by confidence and objective fit
4. **Execute** -- Recommendations can be directly executed via `detect_offline(pipeline=...)`

## CLI workflow

```bash
cpd doctor \
    --input /path/to/signal.csv \
    --objective balanced \
    --min-confidence 0.2 \
    --output doctor.json
```

The output is a JSON file with ranked recommendations, each containing:
- Pipeline specification (detector, cost, stopping, constraints, preprocessing)
- Confidence score and confidence interval
- Resource estimates
- Explanation and warnings
- Objective fit scores

## Python integration

Execute a doctor recommendation directly:

```python
import cpd
import json

# Load doctor output
with open("doctor.json") as f:
    recommendations = json.load(f)

# Use the top recommendation's pipeline
pipeline = recommendations[0]["pipeline"]
result = cpd.detect_offline(x, pipeline=pipeline)
print(result.breakpoints)
```

## Objectives

The objective parameter controls the tradeoff between speed, accuracy, and robustness in pipeline ranking:

| Objective | Description |
|---|---|
| `Balanced` | Default. Balances accuracy, speed, and generality |
| `Speed` | Favors fast algorithms (PELT, CUSUM) with simpler cost models |
| `Accuracy` | Favors algorithms with stronger optimality guarantees (FPOP, SegNeigh) |
| `Robustness` | Favors non-parametric or masking-resistant approaches (WBS, Rank cost) |

## Calibration families

The doctor classifies signals into families for calibration-aware scoring:

| Family | Characteristics |
|---|---|
| `Gaussian` | Near-normal distribution, light tails |
| `HeavyTailed` | Excess kurtosis, outlier-prone |
| `Autocorrelated` | Significant temporal dependence |
| `Seasonal` | Periodic patterns detected |
| `Multivariate` | d > 1 dimensions |
| `Binary` | Values near 0 or 1 (within tolerance) |
| `Count` | Non-negative integer-valued data |

## Confidence formula

Each recommendation includes a calibrated confidence score:

```
confidence = clamp(
    (intercept + slope * heuristic_confidence) * (1 - ood_penalty),
    0.01,
    0.99
)
```

Where:
- `intercept` and `slope` are per-family calibration parameters
- `heuristic_confidence` is the raw score from pipeline-data compatibility analysis
- `ood_penalty = clamp(1 - exp(-0.90 * diagnostic_divergence), 0.0, 0.80)` penalizes out-of-distribution signals
- Final confidence is clamped to [0.01, 0.99]

## Preprocessing recommendations

The doctor also recommends preprocessing based on signal diagnostics:

| Signal property | Recommended preprocessing |
|---|---|
| Linear or polynomial trend | `detrend` |
| Seasonal pattern detected | `deseasonalize` |
| High outlier rate | `winsorize` |
| Scale instability across segments | `robust_scale` |

## Worked example

Consider a seasonal signal with a trend and a change in mean at index 500:

```python
import numpy as np
import cpd

# Seasonal + trend + change point
t = np.arange(1000, dtype=np.float64)
seasonal = 2.0 * np.sin(2 * np.pi * t / 50)
trend = 0.005 * t
shift = np.where(t >= 500, 3.0, 0.0)
noise = np.random.default_rng(42).normal(0, 0.5, 1000)
signal = seasonal + trend + shift + noise

# Doctor would recommend preprocessing + PELT
# After running doctor CLI or using the recommendation:
result = cpd.detect_offline(
    signal,
    detector="pelt",
    cost="l2",
    constraints={"min_segment_len": 10},
    stopping={"pen": "bic"},
    preprocess={
        "detrend": {"method": "linear"},
        "deseasonalize": {"method": "stl_like", "period": 50},
    },
)

print("Change points:", result.change_points)
# Expected: change point near index 500
```

## Multivariate awareness

- **Offline:** Doctor emits multivariate-specific guidance for cost model selection (diagonal vs full covariance tradeoffs)
- **Online:** Doctor rejects multivariate inputs (d > 1) with a clear guidance error, as online detectors currently support only univariate data