Reproducibility#
Reproducibility Modes and Deterministic Contracts#
This document defines the reproducibility contract for ReproMode and how to
interpret expected drift across runs, environments, and platforms.
Scope#
Applies to offline detection APIs and diagnostics metadata.
Covers
strict,balanced(default), andfastmodes.Covers both current implementation behavior and intended long-term contract.
Mode Contract Matrix#
Mode |
Guaranteed |
Not guaranteed |
Performance impact |
When to use |
|---|---|---|---|---|
|
Target: bitwise-identical results on the same machine, toolchain, and CPU feature set. Deterministic numeric accumulation paths are used in supported cost models. |
Bitwise identity across different target triples/architectures/compilers. |
Highest overhead of the three modes due to deterministic numeric reductions. |
Regulated or audit-heavy workflows requiring maximum run-to-run reproducibility. |
|
Deterministic breakpoint sets on the same target triple with the same inputs/configuration and execution settings. |
Bitwise identity across platforms or compiler/toolchain upgrades. |
Baseline production profile. |
Default for most production workloads. |
|
Throughput-first execution with valid outputs and practical segmentation quality. |
Deterministic floating-point ordering and bitwise-identical scores. |
Lowest overhead, highest throughput target. |
Large-scale or latency-sensitive workloads where small numeric drift is acceptable. |
Strict Mode#
strict is the highest reproducibility setting. The goal is bitwise-identical
outputs when all of the following match:
machine and architecture,
toolchain and build settings,
available CPU features.
Current implementation status:
CostL2MeanandCostNormalMeanVaruse compensated/Kahan prefix accumulation underReproMode::Strict.This is the deterministic-numerics path currently implemented in code for strict mode.
Tradeoff:
strict mode may run slower because deterministic compensated reductions are more expensive than non-compensated accumulation.
Balanced Mode (Default)#
balanced is the default (ExecutionContext and Python API defaults).
Contract:
On the same target triple, with the same effective config and execution settings, breakpoint sets are expected to be stable.
Current implementation status:
For current MVP-A offline detectors (
pelt,binseg), code paths are deterministic and do not use RNG-based branching.balancedandfastcurrently share the same numeric accumulation path in implemented cost models; they are separated as contract modes for future behavior/performance evolution.
Fast Mode#
fast is the throughput-oriented mode.
Contract:
Allows implementation strategies that can change floating-point operation ordering and hardware-specific numeric behavior while preserving practical segmentation quality.
Current implementation status:
fastis accepted and propagated in diagnostics and API surfaces.In current MVP-A implementations,
fastandbalancedbehave similarly in the core numeric path; future implementations may diverge to prioritize speed.
Per-Cost-Model Score Tolerances#
For exact-change-point matches in parity validation, score agreement currently uses this relative-error gate:
Cost model |
Score tolerance (exact-change-point cases) |
|---|---|
|
|
|
|
Breakpoint agreement is evaluated separately using tolerance-adjusted segmentation criteria:
default breakpoint tolerance:
±2indices,tolerant pass-rate thresholds:
pelt:l2 >= 0.95binseg:l2 >= 0.90pelt:normal >= 0.90binseg:normal >= 0.90
These thresholds are used for cross-implementation/platform comparability instead of bitwise-equal floating-point scores.
Cross-Platform Reproducibility#
Bitwise-identical results across platforms are generally unrealistic due to:
floating-point instruction and rounding differences,
SIMD feature differences,
compiler/codegen differences,
math library/backend differences.
What is guaranteed instead:
contract-level comparability through breakpoint tolerance and segmentation agreement thresholds,
explicit diagnostics metadata to help explain environment-dependent drift.
Configuration Examples#
Python:
import cpd
result_strict = cpd.detect_offline(x, detector="pelt", cost="l2", repro_mode="strict")
result_balanced = cpd.detect_offline(x, detector="pelt", cost="l2", repro_mode="balanced")
result_fast = cpd.detect_offline(x, detector="pelt", cost="l2", repro_mode="fast")
Rust:
use cpd_core::{Constraints, ExecutionContext, ReproMode};
let constraints = Constraints::default();
let ctx_strict = ExecutionContext::new(&constraints).with_repro_mode(ReproMode::Strict);
let ctx_balanced = ExecutionContext::new(&constraints).with_repro_mode(ReproMode::Balanced);
let ctx_fast = ExecutionContext::new(&constraints).with_repro_mode(ReproMode::Fast);
FAQ#
Why did my results change when I upgraded?#
Check these first:
engine_versionin diagnostics (runtime/library version changed),repro_mode(mode changed from strict/balanced/fast),algorithm/cost model combination (
algorithm,cost_model),target/platform/toolchain differences (for example ARM vs x86, compiler updates, BLAS/runtime differences).
Even with the same mode, upgrades can change low-level numerics or optimization paths. Compare breakpoints with tolerance-based criteria when evaluating cross-version drift.
Why do I get different scores on ARM vs x86?#
This is expected for floating-point-heavy workloads. ARM and x86 can differ in:
available SIMD instructions,
codegen decisions,
floating-point reduction ordering.
Treat cross-platform comparability primarily as a segmentation agreement problem using tolerance-adjusted breakpoint checks and documented parity thresholds, rather than exact score bitwise identity.