Regression¶
Regression detection and baseline management.
Detector¶
agentprobe.regression.detector
¶
Regression detection by comparing baseline and current test results.
Flags regressions (score decreases) and improvements (score increases) based on configurable delta thresholds.
RegressionDetector
¶
Compares current test results against a baseline to detect regressions.
Attributes:
| Name | Type | Description |
|---|---|---|
threshold |
Minimum score delta to flag as regression/improvement. |
Source code in src/agentprobe/regression/detector.py
__init__(threshold=0.05)
¶
Initialize the regression detector.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
threshold
|
float
|
Score delta threshold for flagging changes. |
0.05
|
compare(baseline_name, baseline_results, current_results)
¶
Compare current results against a baseline.
Tests are matched by name. Tests present in only one set are excluded from comparison.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
baseline_name
|
str
|
Name of the baseline for reporting. |
required |
baseline_results
|
Sequence[TestResult]
|
Test results from the baseline run. |
required |
current_results
|
Sequence[TestResult]
|
Test results from the current run. |
required |
Returns:
| Type | Description |
|---|---|
RegressionReport
|
A RegressionReport with per-test comparisons. |
Source code in src/agentprobe/regression/detector.py
Baseline Manager¶
agentprobe.regression.baseline
¶
Baseline management for regression testing.
Provides CRUD operations for named baselines stored as JSON files containing serialized TestResult lists.
BaselineManager
¶
Manages baseline files for regression testing.
Stores sets of TestResult objects as JSON files, enabling comparison between historical and current test runs.
Attributes:
| Name | Type | Description |
|---|---|---|
baseline_dir |
Directory where baseline files are stored. |
Source code in src/agentprobe/regression/baseline.py
__init__(baseline_dir='.agentprobe/baselines')
¶
Initialize the baseline manager.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
baseline_dir
|
str | Path
|
Directory for baseline storage. |
'.agentprobe/baselines'
|
save(name, results)
¶
Save test results as a named baseline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Baseline name. |
required |
results
|
Sequence[TestResult]
|
Test results to save. |
required |
Returns:
| Type | Description |
|---|---|
Path
|
Path to the saved baseline file. |
Source code in src/agentprobe/regression/baseline.py
load(name)
¶
Load a named baseline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Baseline name. |
required |
Returns:
| Type | Description |
|---|---|
list[TestResult]
|
List of saved TestResult objects. |
Raises:
| Type | Description |
|---|---|
RegressionError
|
If the baseline does not exist. |
Source code in src/agentprobe/regression/baseline.py
exists(name)
¶
list_baselines()
¶
delete(name)
¶
Delete a named baseline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Baseline name. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if deleted, False if not found. |