Core¶
Models, runner, assertions, protocols, configuration, and exception hierarchy.
Models¶
agentprobe.core.models
¶
Core data models and enumerations for AgentProbe.
This module defines all Pydantic models used throughout the framework, including traces, test cases, results, and cost summaries. Output types are frozen (immutable); input/configuration types are mutable.
TestStatus
¶
Bases: StrEnum
Status of a single test case execution.
Source code in src/agentprobe/core/models.py
RunStatus
¶
Bases: StrEnum
Status of an overall agent run or test suite execution.
Source code in src/agentprobe/core/models.py
TurnType
¶
EvalVerdict
¶
LLMCall
¶
Bases: BaseModel
A single call to a language model within a trace.
Attributes:
| Name | Type | Description |
|---|---|---|
call_id |
str
|
Unique identifier for this call. |
model |
str
|
Model identifier string (e.g. 'claude-sonnet-4-5-20250929'). |
input_tokens |
int
|
Number of input/prompt tokens consumed. |
output_tokens |
int
|
Number of output/completion tokens produced. |
input_text |
str
|
The prompt or input sent to the model. |
output_text |
str
|
The response text from the model. |
latency_ms |
int
|
Round-trip latency in milliseconds. |
metadata |
dict[str, Any]
|
Additional provider-specific metadata. |
timestamp |
datetime
|
When the call was made. |
Source code in src/agentprobe/core/models.py
ToolCall
¶
Bases: BaseModel
A single tool invocation within a trace.
Attributes:
| Name | Type | Description |
|---|---|---|
call_id |
str
|
Unique identifier for this call. |
tool_name |
str
|
Name of the tool invoked. |
tool_input |
dict[str, Any]
|
Arguments passed to the tool. |
tool_output |
Any
|
Output returned by the tool. |
success |
bool
|
Whether the tool call succeeded. |
error |
str | None
|
Error message if the call failed. |
latency_ms |
int
|
Round-trip latency in milliseconds. |
timestamp |
datetime
|
When the call was made. |
Source code in src/agentprobe/core/models.py
Turn
¶
Bases: BaseModel
A single turn (event) within a trace timeline.
Attributes:
| Name | Type | Description |
|---|---|---|
turn_id |
str
|
Unique identifier for this turn. |
turn_type |
TurnType
|
The type of event this turn represents. |
content |
str
|
Text content of the turn. |
llm_call |
LLMCall | None
|
Associated LLM call, if this is an LLM turn. |
tool_call |
ToolCall | None
|
Associated tool call, if this is a tool turn. |
timestamp |
datetime
|
When the turn occurred. |
Source code in src/agentprobe/core/models.py
Trace
¶
Bases: BaseModel
Complete execution trace of an agent run.
A trace captures the full timeline of LLM calls, tool invocations, and message exchanges during a single agent execution. Once assembled by the TraceRecorder, traces are immutable.
Attributes:
| Name | Type | Description |
|---|---|---|
trace_id |
str
|
Unique identifier for this trace. |
agent_name |
str
|
Name of the agent that produced this trace. |
model |
str | None
|
Primary model used during the run. |
input_text |
str
|
The input/prompt given to the agent. |
output_text |
str
|
The final output produced by the agent. |
turns |
tuple[Turn, ...]
|
Ordered list of turns in the execution timeline. |
llm_calls |
tuple[LLMCall, ...]
|
All LLM calls made during the run. |
tool_calls |
tuple[ToolCall, ...]
|
All tool calls made during the run. |
total_input_tokens |
int
|
Aggregate input tokens across all LLM calls. |
total_output_tokens |
int
|
Aggregate output tokens across all LLM calls. |
total_latency_ms |
int
|
Total execution time in milliseconds. |
tags |
tuple[str, ...]
|
Tags for filtering and grouping. |
metadata |
dict[str, Any]
|
Additional run metadata. |
created_at |
datetime
|
When the trace was created. |
Source code in src/agentprobe/core/models.py
EvalResult
¶
Bases: BaseModel
Result produced by an evaluator.
Attributes:
| Name | Type | Description |
|---|---|---|
eval_id |
str
|
Unique identifier for this evaluation. |
evaluator_name |
str
|
Name of the evaluator that produced this result. |
verdict |
EvalVerdict
|
Pass/fail/partial/error verdict. |
score |
float
|
Numeric score between 0.0 and 1.0. |
reason |
str
|
Human-readable explanation of the verdict. |
metadata |
dict[str, Any]
|
Additional evaluator-specific data. |
created_at |
datetime
|
When the evaluation was performed. |
Source code in src/agentprobe/core/models.py
AssertionResult
¶
Bases: BaseModel
Result of a single test assertion.
Attributes:
| Name | Type | Description |
|---|---|---|
assertion_type |
str
|
Type of assertion (e.g. 'contain', 'match'). |
passed |
bool
|
Whether the assertion passed. |
expected |
Any
|
The expected value. |
actual |
Any
|
The actual value. |
message |
str
|
Descriptive message about the result. |
Source code in src/agentprobe/core/models.py
TestCase
¶
Bases: BaseModel
A single test scenario to be executed against an agent.
TestCase is mutable because the runner populates fields during execution (e.g. status transitions, attaching results).
Attributes:
| Name | Type | Description |
|---|---|---|
test_id |
str
|
Unique identifier for this test case. |
name |
str
|
Human-readable name (usually from the @scenario decorator). |
description |
str
|
Detailed description of what this test validates. |
input_text |
str
|
The input prompt to send to the agent. |
expected_output |
str | None
|
Optional expected output for comparison. |
tags |
list[str]
|
Tags for filtering and grouping. |
timeout_seconds |
float
|
Maximum allowed execution time. |
evaluators |
list[str]
|
Names of evaluators to run on this test. |
metadata |
dict[str, Any]
|
Additional test configuration. |
Source code in src/agentprobe/core/models.py
validate_name(v)
classmethod
¶
Ensure test name contains only valid characters.
Source code in src/agentprobe/core/models.py
TestResult
¶
Bases: BaseModel
Complete result of executing a single test case.
Attributes:
| Name | Type | Description |
|---|---|---|
result_id |
str
|
Unique identifier for this result. |
test_name |
str
|
Name of the test that was executed. |
status |
TestStatus
|
Final status of the test execution. |
score |
float
|
Aggregate score from evaluators (0.0 to 1.0). |
duration_ms |
int
|
Execution time in milliseconds. |
trace |
Trace | None
|
The execution trace, if recording was enabled. |
eval_results |
tuple[EvalResult, ...]
|
Results from all evaluators run on this test. |
assertion_results |
tuple[AssertionResult, ...]
|
Results from all assertions. |
error_message |
str | None
|
Error description if the test errored. |
created_at |
datetime
|
When the result was recorded. |
Source code in src/agentprobe/core/models.py
CostBreakdown
¶
Bases: BaseModel
Cost breakdown for a single model.
Attributes:
| Name | Type | Description |
|---|---|---|
model |
str
|
The model identifier. |
input_tokens |
int
|
Total input tokens for this model. |
output_tokens |
int
|
Total output tokens for this model. |
input_cost_usd |
float
|
Cost for input tokens in USD. |
output_cost_usd |
float
|
Cost for output tokens in USD. |
total_cost_usd |
float
|
Total cost in USD. |
call_count |
int
|
Number of calls to this model. |
Source code in src/agentprobe/core/models.py
CostSummary
¶
Bases: BaseModel
Aggregate cost summary for a trace or test suite.
Attributes:
| Name | Type | Description |
|---|---|---|
total_llm_cost_usd |
float
|
Total cost of all LLM calls in USD. |
total_tool_cost_usd |
float
|
Total cost of tool usage in USD. |
total_cost_usd |
float
|
Grand total cost in USD. |
breakdown_by_model |
dict[str, CostBreakdown]
|
Per-model cost breakdown. |
total_input_tokens |
int
|
Aggregate input tokens. |
total_output_tokens |
int
|
Aggregate output tokens. |
Source code in src/agentprobe/core/models.py
MetricType
¶
TrendDirection
¶
Bases: StrEnum
Direction of a metric trend over time.
Source code in src/agentprobe/core/models.py
PluginType
¶
ChaosType
¶
Bases: StrEnum
Type of chaos fault to inject during testing.
Source code in src/agentprobe/core/models.py
ConversationTurn
¶
Bases: BaseModel
Specification for a single turn in a multi-turn conversation test.
Attributes:
| Name | Type | Description |
|---|---|---|
turn_id |
str
|
Unique identifier for this turn. |
input_text |
str
|
The input to send for this turn. |
expected_output |
str | None
|
Optional expected output for this turn. |
evaluators |
tuple[str, ...]
|
Evaluator names to run on this turn's result. |
metadata |
dict[str, Any]
|
Additional turn-level configuration. |
Source code in src/agentprobe/core/models.py
TurnResult
¶
Bases: BaseModel
Result from executing a single conversation turn.
Attributes:
| Name | Type | Description |
|---|---|---|
turn_index |
int
|
Zero-based index of this turn. |
input_text |
str
|
The input sent for this turn. |
trace |
Trace | None
|
Execution trace from this turn. |
eval_results |
tuple[EvalResult, ...]
|
Results from evaluators run on this turn. |
duration_ms |
int
|
Execution time for this turn in milliseconds. |
Source code in src/agentprobe/core/models.py
ConversationResult
¶
Bases: BaseModel
Aggregate result from a multi-turn conversation test.
Attributes:
| Name | Type | Description |
|---|---|---|
conversation_id |
str
|
Unique identifier for this conversation. |
agent_name |
str
|
Name of the agent tested. |
turn_results |
tuple[TurnResult, ...]
|
Per-turn results in order. |
total_turns |
int
|
Number of turns executed. |
passed_turns |
int
|
Number of turns where all evaluators passed. |
aggregate_score |
float
|
Mean score across all turns. |
total_duration_ms |
int
|
Total execution time in milliseconds. |
Source code in src/agentprobe/core/models.py
StatisticalSummary
¶
Bases: BaseModel
Summary statistics from repeated evaluations.
Attributes:
| Name | Type | Description |
|---|---|---|
evaluator_name |
str
|
Name of the evaluator that produced these stats. |
sample_count |
int
|
Number of evaluation runs. |
scores |
tuple[float, ...]
|
Raw scores from each run (for reproducibility). |
mean |
float
|
Arithmetic mean of scores. |
std_dev |
float
|
Standard deviation of scores. |
median |
float
|
Median score. |
p5 |
float
|
5th percentile score. |
p95 |
float
|
95th percentile score. |
ci_lower |
float
|
Lower bound of 95% confidence interval. |
ci_upper |
float
|
Upper bound of 95% confidence interval. |
Source code in src/agentprobe/core/models.py
TestComparison
¶
Bases: BaseModel
Comparison of a single test between baseline and current results.
Attributes:
| Name | Type | Description |
|---|---|---|
test_name |
str
|
Name of the compared test. |
baseline_score |
float
|
Score from the baseline run. |
current_score |
float
|
Score from the current run. |
delta |
float
|
Score change (current - baseline). |
is_regression |
bool
|
Whether the change constitutes a regression. |
is_improvement |
bool
|
Whether the change constitutes an improvement. |
Source code in src/agentprobe/core/models.py
RegressionReport
¶
Bases: BaseModel
Report from comparing current results against a baseline.
Attributes:
| Name | Type | Description |
|---|---|---|
baseline_name |
str
|
Name of the baseline used for comparison. |
comparisons |
tuple[TestComparison, ...]
|
Per-test comparisons. |
total_tests |
int
|
Number of tests compared. |
regressions |
int
|
Number of tests that regressed. |
improvements |
int
|
Number of tests that improved. |
unchanged |
int
|
Number of tests with no significant change. |
threshold |
float
|
Score delta threshold used for regression detection. |
Source code in src/agentprobe/core/models.py
BudgetCheckResult
¶
Bases: BaseModel
Result of checking a cost against a budget.
Attributes:
| Name | Type | Description |
|---|---|---|
within_budget |
bool
|
Whether the cost is within the budget. |
actual_cost_usd |
float
|
The actual cost incurred. |
budget_limit_usd |
float
|
The budget limit. |
remaining_usd |
float
|
Budget remaining (may be negative if exceeded). |
utilization_pct |
float
|
Percentage of budget used. |
Source code in src/agentprobe/core/models.py
DiffItem
¶
Bases: BaseModel
A single difference between two snapshots.
Attributes:
| Name | Type | Description |
|---|---|---|
dimension |
str
|
The dimension being compared (e.g. 'tool_calls', 'cost'). |
expected |
Any
|
The expected (baseline) value. |
actual |
Any
|
The actual (current) value. |
similarity |
float
|
Similarity score for this dimension (0.0 to 1.0). |
Source code in src/agentprobe/core/models.py
SnapshotDiff
¶
Bases: BaseModel
Comparison result between a snapshot and current output.
Attributes:
| Name | Type | Description |
|---|---|---|
snapshot_name |
str
|
Name of the snapshot being compared. |
overall_similarity |
float
|
Weighted average similarity across dimensions. |
diffs |
tuple[DiffItem, ...]
|
Per-dimension comparison details. |
is_match |
bool
|
Whether the overall similarity meets the threshold. |
threshold |
float
|
Similarity threshold used. |
Source code in src/agentprobe/core/models.py
TraceStep
¶
Bases: BaseModel
A single step in a time-travel trace, with cumulative metrics.
Attributes:
| Name | Type | Description |
|---|---|---|
step_index |
int
|
Zero-based index of this step. |
turn |
Turn
|
The trace turn at this step. |
cumulative_input_tokens |
int
|
Total input tokens up to this step. |
cumulative_output_tokens |
int
|
Total output tokens up to this step. |
cumulative_cost_usd |
float
|
Estimated cumulative cost up to this step. |
cumulative_latency_ms |
int
|
Total latency up to this step. |
Source code in src/agentprobe/core/models.py
ReplayDiff
¶
Bases: BaseModel
Diff between an original trace and a replay trace.
Attributes:
| Name | Type | Description |
|---|---|---|
original_trace_id |
str
|
ID of the original trace. |
replay_trace_id |
str
|
ID of the replay trace. |
tool_call_diffs |
tuple[DiffItem, ...]
|
Differences in tool calls. |
output_matches |
bool
|
Whether the outputs match. |
original_output |
str
|
Output from the original trace. |
replay_output |
str
|
Output from the replay trace. |
Source code in src/agentprobe/core/models.py
ChaosOverride
¶
Bases: BaseModel
Configuration for a single chaos fault injection.
Attributes:
| Name | Type | Description |
|---|---|---|
chaos_type |
ChaosType
|
Type of fault to inject. |
probability |
float
|
Probability of applying this fault (0.0 to 1.0). |
target_tool |
str | None
|
If set, only apply to this specific tool. |
delay_ms |
int
|
Delay in ms for SLOW type. |
error_message |
str
|
Custom error message for ERROR type. |
Source code in src/agentprobe/core/models.py
AgentRun
¶
Bases: BaseModel
A complete agent test run encompassing multiple test results.
Attributes:
| Name | Type | Description |
|---|---|---|
run_id |
str
|
Unique identifier for this run. |
agent_name |
str
|
Name of the agent tested. |
status |
RunStatus
|
Overall run status. |
test_results |
tuple[TestResult, ...]
|
All test results from this run. |
total_tests |
int
|
Total number of tests. |
passed |
int
|
Number of tests that passed. |
failed |
int
|
Number of tests that failed. |
errors |
int
|
Number of tests that errored. |
skipped |
int
|
Number of tests skipped. |
cost_summary |
CostSummary | None
|
Aggregate cost for the run. |
duration_ms |
int
|
Total run duration in milliseconds. |
tags |
tuple[str, ...]
|
Tags for filtering. |
metadata |
dict[str, Any]
|
Additional run metadata. |
created_at |
datetime
|
When the run started. |
Source code in src/agentprobe/core/models.py
MetricDefinition
¶
Bases: BaseModel
Definition of a named metric that can be collected and tracked.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique metric identifier (e.g. 'latency_ms', 'token_cost_usd'). |
metric_type |
MetricType
|
Category of the metric. |
description |
str
|
Human-readable description. |
unit |
str
|
Unit of measurement (e.g. 'ms', 'usd', 'count'). |
lower_is_better |
bool
|
Whether lower values indicate better performance. |
Source code in src/agentprobe/core/models.py
MetricValue
¶
Bases: BaseModel
A single metric measurement at a point in time.
Attributes:
| Name | Type | Description |
|---|---|---|
metric_name |
str
|
Name of the metric this value belongs to. |
value |
float
|
The numeric measurement. |
tags |
tuple[str, ...]
|
Tags for filtering and grouping. |
metadata |
dict[str, Any]
|
Additional context about this measurement. |
timestamp |
datetime
|
When the measurement was taken. |
Source code in src/agentprobe/core/models.py
MetricAggregation
¶
Bases: BaseModel
Aggregated statistics for a collection of metric values.
Attributes:
| Name | Type | Description |
|---|---|---|
metric_name |
str
|
Name of the metric. |
count |
int
|
Number of values aggregated. |
mean |
float
|
Arithmetic mean. |
median |
float
|
Median value. |
min_value |
float
|
Minimum value. |
max_value |
float
|
Maximum value. |
p95 |
float
|
95th percentile. |
p99 |
float
|
99th percentile. |
std_dev |
float
|
Standard deviation. |
Source code in src/agentprobe/core/models.py
TraceDiffReport
¶
Bases: BaseModel
Report from comparing two independent traces.
Compares output text, tool call sequences, model usage, token counts, and latency between any two traces.
Attributes:
| Name | Type | Description |
|---|---|---|
trace_a_id |
str
|
ID of the first trace. |
trace_b_id |
str
|
ID of the second trace. |
tool_call_diffs |
tuple[DiffItem, ...]
|
Per-tool-call comparison items. |
output_matches |
bool
|
Whether the output texts match exactly. |
token_delta |
int
|
Difference in total tokens (B - A). |
latency_delta_ms |
int
|
Difference in total latency (B - A). |
overall_similarity |
float
|
Weighted similarity score (0.0 to 1.0). |
Source code in src/agentprobe/core/models.py
Runner¶
agentprobe.core.runner
¶
Test runner: orchestrates test execution with optional parallelism.
Discovers tests, invokes them against an adapter, runs evaluators, and assembles results into an AgentRun.
TestRunner
¶
Orchestrates test case execution against an agent adapter.
Supports sequential and parallel execution modes, per-test timeouts, and evaluator orchestration.
Attributes:
| Name | Type | Description |
|---|---|---|
config |
The runner configuration. |
|
evaluators |
Evaluators to run on each test result. |
Source code in src/agentprobe/core/runner.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 | |
__init__(config=None, evaluators=None)
¶
Initialize the test runner.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
AgentProbeConfig | None
|
AgentProbe configuration. Uses defaults if None. |
None
|
evaluators
|
list[EvaluatorProtocol] | None
|
Evaluators to apply to test results. |
None
|
Source code in src/agentprobe/core/runner.py
run(test_cases, adapter)
async
¶
Execute test cases against an adapter and collect results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
test_cases
|
Sequence[TestCase]
|
The test cases to execute. |
required |
adapter
|
AdapterProtocol
|
The agent adapter to test. |
required |
Returns:
| Type | Description |
|---|---|
AgentRun
|
An AgentRun with all results. |
Source code in src/agentprobe/core/runner.py
Assertions¶
agentprobe.core.assertions
¶
Fluent assertion API for validating agent outputs and tool calls.
Provides expect() and expect_tool_calls() entry points that
return chainable expectation objects.
OutputExpectation
¶
Fluent expectation chain for validating string output.
Each assertion method returns self for chaining. Results
accumulate in results and can be checked with all_passed().
Source code in src/agentprobe/core/assertions.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 | |
to_contain(substring)
¶
Assert that the output contains the given substring.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
substring
|
str
|
The substring to search for. |
required |
Returns:
| Type | Description |
|---|---|
OutputExpectation
|
Self for chaining. |
Source code in src/agentprobe/core/assertions.py
to_not_contain(substring)
¶
Assert that the output does NOT contain the given substring.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
substring
|
str
|
The substring that should not appear. |
required |
Returns:
| Type | Description |
|---|---|
OutputExpectation
|
Self for chaining. |
Source code in src/agentprobe/core/assertions.py
to_match(pattern)
¶
Assert that the output matches a regex pattern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pattern
|
str
|
Regular expression pattern. |
required |
Returns:
| Type | Description |
|---|---|
OutputExpectation
|
Self for chaining. |
Source code in src/agentprobe/core/assertions.py
to_have_length_less_than(max_length)
¶
Assert that the output length is less than the given value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_length
|
int
|
Maximum allowed length. |
required |
Returns:
| Type | Description |
|---|---|
OutputExpectation
|
Self for chaining. |
Source code in src/agentprobe/core/assertions.py
to_be_valid_json()
¶
Assert that the output is valid JSON.
Returns:
| Type | Description |
|---|---|
OutputExpectation
|
Self for chaining. |
Source code in src/agentprobe/core/assertions.py
to_contain_any_of(substrings)
¶
Assert that the output contains at least one of the substrings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
substrings
|
Sequence[str]
|
Substrings to check for. |
required |
Returns:
| Type | Description |
|---|---|
OutputExpectation
|
Self for chaining. |
Source code in src/agentprobe/core/assertions.py
ToolCallExpectation
¶
Fluent expectation chain for validating tool call sequences.
Source code in src/agentprobe/core/assertions.py
174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 | |
to_contain(tool_name)
¶
Assert that a tool with the given name was called.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tool_name
|
str
|
The expected tool name. |
required |
Returns:
| Type | Description |
|---|---|
ToolCallExpectation
|
Self for chaining. |
Source code in src/agentprobe/core/assertions.py
to_have_sequence(expected_sequence)
¶
Assert that tools were called in the given order.
The expected sequence must appear as a contiguous subsequence in the actual tool call names.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
expected_sequence
|
Sequence[str]
|
Ordered tool names to match. |
required |
Returns:
| Type | Description |
|---|---|
ToolCallExpectation
|
Self for chaining. |
Source code in src/agentprobe/core/assertions.py
to_have_count(count)
¶
Assert the total number of tool calls.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
count
|
int
|
Expected number of tool calls. |
required |
Returns:
| Type | Description |
|---|---|
ToolCallExpectation
|
Self for chaining. |
Source code in src/agentprobe/core/assertions.py
expect(output)
¶
Create a fluent output expectation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output
|
str
|
The agent output string to validate. |
required |
Returns:
| Type | Description |
|---|---|
OutputExpectation
|
An OutputExpectation for chaining assertions. |
Source code in src/agentprobe/core/assertions.py
expect_tool_calls(tool_calls)
¶
Create a fluent tool call expectation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tool_calls
|
Sequence[ToolCall]
|
The sequence of tool calls to validate. |
required |
Returns:
| Type | Description |
|---|---|
ToolCallExpectation
|
A ToolCallExpectation for chaining assertions. |
Source code in src/agentprobe/core/assertions.py
Scenario Decorator¶
agentprobe.core.scenario
¶
Scenario decorator and registry for defining agent test cases.
The @scenario decorator marks functions as test scenarios and
registers them in a global registry for discovery by the test runner.
scenario(name=None, *, input_text='', expected_output=None, tags=None, timeout=30.0, evaluators=None)
¶
Decorator that registers a function as a test scenario.
The decorated function can optionally accept a TestCase argument
and mutate it (e.g. setting dynamic input). If it returns a string,
that string overrides input_text.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str | None
|
Test name. Defaults to the function name. |
None
|
input_text
|
str
|
The input prompt to send to the agent. |
''
|
expected_output
|
str | None
|
Optional expected output for comparison. |
None
|
tags
|
list[str] | None
|
Tags for filtering and grouping. |
None
|
timeout
|
float
|
Maximum execution time in seconds. |
30.0
|
evaluators
|
list[str] | None
|
Names of evaluators to run. |
None
|
Returns:
| Type | Description |
|---|---|
Callable[[Callable[..., Any]], Callable[..., Any]]
|
A decorator that registers the function. |
Source code in src/agentprobe/core/scenario.py
get_scenarios(module_name=None)
¶
Retrieve registered scenarios.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
module_name
|
str | None
|
If provided, return scenarios from this module only. If None, return all registered scenarios. |
None
|
Returns:
| Type | Description |
|---|---|
list[TestCase]
|
A list of TestCase objects. |
Source code in src/agentprobe/core/scenario.py
Configuration¶
agentprobe.core.config
¶
Configuration loading and validation for AgentProbe.
Loads configuration from agentprobe.yaml with support for
${ENV_VAR} interpolation and sensible defaults.
RunnerConfig
¶
Bases: BaseModel
Configuration for the test runner.
Attributes:
| Name | Type | Description |
|---|---|---|
parallel |
bool
|
Whether to run tests in parallel. |
max_workers |
int
|
Maximum number of concurrent tests. |
default_timeout |
float
|
Default test timeout in seconds. |
Source code in src/agentprobe/core/config.py
EvalConfig
¶
Bases: BaseModel
Configuration for evaluators.
Attributes:
| Name | Type | Description |
|---|---|---|
default_evaluators |
list[str]
|
Evaluator names to apply to all tests. |
Source code in src/agentprobe/core/config.py
JudgeConfig
¶
Bases: BaseModel
Configuration for the judge evaluator.
Attributes:
| Name | Type | Description |
|---|---|---|
model |
str
|
Model to use for judging. |
provider |
str
|
API provider name. |
temperature |
float
|
Sampling temperature. |
max_tokens |
int
|
Maximum response tokens. |
Source code in src/agentprobe/core/config.py
TraceConfig
¶
Bases: BaseModel
Configuration for trace recording and storage.
Attributes:
| Name | Type | Description |
|---|---|---|
enabled |
bool
|
Whether to record traces. |
storage_backend |
str
|
Storage backend type. |
database_path |
str
|
Path to SQLite database file. |
Source code in src/agentprobe/core/config.py
CostConfig
¶
Bases: BaseModel
Configuration for cost tracking.
Attributes:
| Name | Type | Description |
|---|---|---|
enabled |
bool
|
Whether to track costs. |
budget_limit_usd |
float | None
|
Maximum allowed cost per run. |
pricing_dir |
str | None
|
Directory containing pricing YAML files. |
Source code in src/agentprobe/core/config.py
SafetyConfig
¶
Bases: BaseModel
Configuration for safety testing.
Attributes:
| Name | Type | Description |
|---|---|---|
enabled |
bool
|
Whether to run safety tests. |
suites |
list[str]
|
List of safety suite names to run. |
Source code in src/agentprobe/core/config.py
ChaosConfig
¶
Bases: BaseModel
Configuration for chaos fault injection testing.
Attributes:
| Name | Type | Description |
|---|---|---|
enabled |
bool
|
Whether chaos testing is enabled. |
seed |
int
|
Random seed for deterministic fault injection. |
default_probability |
float
|
Default probability of applying a fault. |
Source code in src/agentprobe/core/config.py
SnapshotConfig
¶
Bases: BaseModel
Configuration for snapshot/golden file testing.
Attributes:
| Name | Type | Description |
|---|---|---|
enabled |
bool
|
Whether snapshot testing is enabled. |
snapshot_dir |
str
|
Directory for storing snapshot files. |
update_on_first_run |
bool
|
Whether to create snapshots on first run. |
threshold |
float
|
Similarity threshold for snapshot matching. |
Source code in src/agentprobe/core/config.py
BudgetConfig
¶
Bases: BaseModel
Configuration for per-test and per-suite cost budgets.
Attributes:
| Name | Type | Description |
|---|---|---|
test_budget_usd |
float | None
|
Maximum cost per individual test. |
suite_budget_usd |
float | None
|
Maximum cost per test suite run. |
Source code in src/agentprobe/core/config.py
RegressionConfig
¶
Bases: BaseModel
Configuration for regression detection.
Attributes:
| Name | Type | Description |
|---|---|---|
enabled |
bool
|
Whether regression detection is enabled. |
baseline_dir |
str
|
Directory for storing baseline files. |
threshold |
float
|
Score delta threshold for flagging regressions. |
Source code in src/agentprobe/core/config.py
MetricsConfig
¶
Bases: BaseModel
Configuration for metric collection and trending.
Attributes:
| Name | Type | Description |
|---|---|---|
enabled |
bool
|
Whether metric collection is enabled. |
builtin_metrics |
bool
|
Whether to collect built-in metrics automatically. |
trend_window |
int
|
Number of recent runs to use for trend analysis. |
Source code in src/agentprobe/core/config.py
PluginConfig
¶
Bases: BaseModel
Configuration for the plugin system.
Attributes:
| Name | Type | Description |
|---|---|---|
enabled |
bool
|
Whether the plugin system is enabled. |
directories |
list[str]
|
Additional directories to scan for plugins. |
entry_point_group |
str
|
Entry point group name for plugin discovery. |
Source code in src/agentprobe/core/config.py
ReportingConfig
¶
Bases: BaseModel
Configuration for result reporting.
Attributes:
| Name | Type | Description |
|---|---|---|
formats |
list[str]
|
Output format names. |
output_dir |
str
|
Directory for report files. |
Source code in src/agentprobe/core/config.py
AgentProbeConfig
¶
Bases: BaseModel
Top-level AgentProbe configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
project_name |
str
|
Name of the project being tested. |
test_dir |
str
|
Directory containing test files. |
runner |
RunnerConfig
|
Test runner configuration. |
eval |
EvalConfig
|
Evaluator configuration. |
judge |
JudgeConfig
|
Judge evaluator configuration. |
trace |
TraceConfig
|
Trace recording configuration. |
cost |
CostConfig
|
Cost tracking configuration. |
safety |
SafetyConfig
|
Safety testing configuration. |
reporting |
ReportingConfig
|
Reporting configuration. |
Source code in src/agentprobe/core/config.py
load_config(path=None)
¶
Load configuration from a YAML file.
Searches for agentprobe.yaml or agentprobe.yml in the
current directory if no path is provided. Returns default config
if no file is found.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path | None
|
Explicit path to a config file. |
None
|
Returns:
| Type | Description |
|---|---|
AgentProbeConfig
|
A validated AgentProbeConfig instance. |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the file exists but is invalid. |
Source code in src/agentprobe/core/config.py
Protocols¶
agentprobe.core.protocols
¶
Protocol definitions for AgentProbe's pluggable architecture.
All protocols are runtime-checkable, allowing isinstance() verification of structural subtyping. Implementors do not need to inherit from these protocols — they only need to provide the required methods.
AdapterProtocol
¶
Bases: Protocol
Interface for agent framework adapters.
Adapters wrap specific agent frameworks (LangChain, CrewAI, etc.) and translate their execution into AgentProbe's Trace format.
Source code in src/agentprobe/core/protocols.py
name
property
¶
Return the adapter name.
invoke(input_text, **kwargs)
async
¶
Invoke the agent with the given input and return a trace.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_text
|
str
|
The input prompt to send to the agent. |
required |
**kwargs
|
Any
|
Additional adapter-specific arguments. |
{}
|
Returns:
| Type | Description |
|---|---|
Trace
|
A complete execution trace. |
Source code in src/agentprobe/core/protocols.py
EvaluatorProtocol
¶
Bases: Protocol
Interface for test result evaluators.
Evaluators assess agent outputs against expectations, producing scored results with pass/fail verdicts.
Source code in src/agentprobe/core/protocols.py
name
property
¶
Return the evaluator name.
evaluate(test_case, trace)
async
¶
Evaluate an agent's output for a given test case.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
test_case
|
TestCase
|
The test case that was executed. |
required |
trace
|
Trace
|
The execution trace to evaluate. |
required |
Returns:
| Type | Description |
|---|---|
EvalResult
|
An evaluation result with score and verdict. |
Source code in src/agentprobe/core/protocols.py
StorageProtocol
¶
Bases: Protocol
Interface for persistence backends.
Storage implementations handle saving and loading traces, test results, and agent runs.
Source code in src/agentprobe/core/protocols.py
setup()
async
¶
save_trace(trace)
async
¶
load_trace(trace_id)
async
¶
Load a trace by ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
trace_id
|
str
|
The unique identifier of the trace. |
required |
Returns:
| Type | Description |
|---|---|
Trace | None
|
The trace if found, otherwise None. |
list_traces(agent_name=None, limit=100)
async
¶
List traces with optional filtering.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
agent_name
|
str | None
|
Filter by agent name if provided. |
None
|
limit
|
int
|
Maximum number of traces to return. |
100
|
Returns:
| Type | Description |
|---|---|
Sequence[Trace]
|
A sequence of matching traces. |
Source code in src/agentprobe/core/protocols.py
save_result(result)
async
¶
Persist a test result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
TestResult
|
The test result to save. |
required |
load_results(test_name=None, limit=100)
async
¶
Load test results with optional filtering.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
test_name
|
str | None
|
Filter by test name if provided. |
None
|
limit
|
int
|
Maximum number of results to return. |
100
|
Returns:
| Type | Description |
|---|---|
Sequence[TestResult]
|
A sequence of matching test results. |
Source code in src/agentprobe/core/protocols.py
MetricStoreProtocol
¶
Bases: Protocol
Interface for metric persistence backends.
Metric storage is optional and separate from the main StorageProtocol, allowing implementations to opt in to metric tracking independently.
Source code in src/agentprobe/core/protocols.py
save_metrics(metrics)
async
¶
Persist a batch of metric values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metrics
|
Sequence[MetricValue]
|
The metric values to save. |
required |
load_metrics(metric_name=None, limit=1000)
async
¶
Load metric values with optional filtering.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric_name
|
str | None
|
Filter by metric name if provided. |
None
|
limit
|
int
|
Maximum number of values to return. |
1000
|
Returns:
| Type | Description |
|---|---|
Sequence[MetricValue]
|
A sequence of matching metric values. |
Source code in src/agentprobe/core/protocols.py
RunnerProtocol
¶
Bases: Protocol
Interface for test execution engines.
Source code in src/agentprobe/core/protocols.py
run(test_cases, adapter)
async
¶
Execute a batch of test cases against an agent adapter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
test_cases
|
Sequence[TestCase]
|
The test cases to execute. |
required |
adapter
|
AdapterProtocol
|
The agent adapter to test against. |
required |
Returns:
| Type | Description |
|---|---|
AgentRun
|
An AgentRun containing all results. |
Source code in src/agentprobe/core/protocols.py
Exceptions¶
agentprobe.core.exceptions
¶
Exception hierarchy for the AgentProbe framework.
All exceptions inherit from AgentProbeError, allowing callers to catch the base type for generic error handling or specific subclasses for targeted recovery.
AgentProbeError
¶
ConfigError
¶
Bases: AgentProbeError
Raised when configuration is invalid or missing.
RunnerError
¶
Bases: AgentProbeError
Raised when the test runner encounters an execution failure.
TestTimeoutError
¶
Bases: RunnerError
Raised when a test exceeds its configured timeout.
Source code in src/agentprobe/core/exceptions.py
AdapterError
¶
Bases: AgentProbeError
Raised when an agent adapter fails during invocation.
Source code in src/agentprobe/core/exceptions.py
EvaluatorError
¶
Bases: AgentProbeError
Base exception for evaluation errors.
JudgeAPIError
¶
Bases: EvaluatorError
Raised when the judge model API call fails.
Source code in src/agentprobe/core/exceptions.py
StorageError
¶
Bases: AgentProbeError
Raised when a storage backend operation fails.
TraceError
¶
Bases: AgentProbeError
Raised when trace recording or processing fails.
CostError
¶
Bases: AgentProbeError
Raised when cost calculation encounters an error.
BudgetExceededError
¶
Bases: CostError
Raised when a cost budget limit is exceeded.
Source code in src/agentprobe/core/exceptions.py
SafetyError
¶
Bases: AgentProbeError
Raised when a safety check fails or encounters an error.
SecurityError
¶
Bases: AgentProbeError
Raised when a security violation is detected.
MetricsError
¶
Bases: AgentProbeError
Raised when metric collection, aggregation, or trending fails.
PluginError
¶
Bases: AgentProbeError
Raised when a plugin fails to load or execute.
ChaosError
¶
Bases: AgentProbeError
Raised when a chaos fault injection causes a failure.
SnapshotError
¶
Bases: AgentProbeError
Raised when a snapshot operation fails.
ReplayError
¶
Bases: AgentProbeError
Raised when trace replay encounters an error.
RegressionError
¶
Bases: AgentProbeError
Raised when regression detection encounters an error.
ConversationError
¶
Bases: AgentProbeError
Raised when a multi-turn conversation test fails.
DashboardError
¶
Bases: AgentProbeError
Raised when dashboard operations fail.
AssertionFailedError
¶
Bases: AgentProbeError
Raised when a test assertion fails.
Attributes:
| Name | Type | Description |
|---|---|---|
assertion_type |
The type of assertion that failed (e.g. 'contain', 'match'). |
|
expected |
The expected value or pattern. |
|
actual |
The actual value received. |
Source code in src/agentprobe/core/exceptions.py
Discovery¶
agentprobe.core.discovery
¶
Test discovery: finds and loads test modules with @scenario decorators.
Scans directories for Python files matching test patterns, imports them, and extracts registered test cases.
discover_test_files(test_dir, pattern='test_*.py')
¶
Find test files matching a pattern in the given directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
test_dir
|
str | Path
|
Root directory to search. |
required |
pattern
|
str
|
Glob pattern for test files. |
'test_*.py'
|
Returns:
| Type | Description |
|---|---|
list[Path]
|
Sorted list of matching file paths. |
Source code in src/agentprobe/core/discovery.py
load_test_module(file_path)
¶
Import a test module from a file path.
Uses importlib to load the module with a unique name derived
from the file path. The module is registered in sys.modules.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
Path
|
Path to the Python test file. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The module name used for registration. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If the module cannot be loaded. |
Source code in src/agentprobe/core/discovery.py
extract_test_cases(test_dir, pattern='test_*.py')
¶
Discover and extract all test cases from a directory.
Finds test files, imports them (triggering @scenario registration), then collects all registered test cases.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
test_dir
|
str | Path
|
Root directory to search. |
required |
pattern
|
str
|
Glob pattern for test files. |
'test_*.py'
|
Returns:
| Type | Description |
|---|---|
list[TestCase]
|
List of all discovered TestCase objects. |
Source code in src/agentprobe/core/discovery.py
Conversation Runner¶
agentprobe.core.conversation
¶
Multi-turn conversation runner for sequential dialogue testing.
Executes a series of conversation turns against an agent adapter, collecting per-turn traces and evaluation results, then aggregates into a ConversationResult.
ConversationRunner
¶
Runs multi-turn conversation tests against an agent.
Executes each turn sequentially, optionally passing the previous output as context to the next turn's input. Collects per-turn evaluation results and aggregates into a final ConversationResult.
Attributes:
| Name | Type | Description |
|---|---|---|
evaluators |
Mapping of evaluator names to instances. |
Source code in src/agentprobe/core/conversation.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | |
__init__(evaluators=None)
¶
Initialize the conversation runner.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
evaluators
|
dict[str, EvaluatorProtocol] | None
|
Named evaluator instances for per-turn evaluation. |
None
|
Source code in src/agentprobe/core/conversation.py
run(adapter, turns, *, pass_context=True)
async
¶
Execute a multi-turn conversation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adapter
|
AdapterProtocol
|
The agent adapter to invoke for each turn. |
required |
turns
|
Sequence[ConversationTurn]
|
The conversation turns to execute in order. |
required |
pass_context
|
bool
|
If True, prepend previous output to next turn's input. |
True
|
Returns:
| Type | Description |
|---|---|
ConversationResult
|
A ConversationResult with per-turn details and aggregate metrics. |
Raises:
| Type | Description |
|---|---|
ConversationError
|
If a critical error occurs during execution. |
Source code in src/agentprobe/core/conversation.py
Chaos Proxy¶
agentprobe.core.chaos
¶
Chaos fault injection proxy for testing agent resilience.
Wraps an adapter and modifies tool call results in the resulting trace to simulate failures, timeouts, malformed data, rate limits, slow responses, and empty responses.
ChaosProxy
¶
Wraps an adapter and injects chaos faults into tool call results.
After the real adapter produces a trace, ChaosProxy scans tool calls and probabilistically replaces their outputs with fault-injected variants. The modified trace is returned as a frozen copy.
Attributes:
| Name | Type | Description |
|---|---|---|
overrides |
Configured fault injection rules. |
Source code in src/agentprobe/core/chaos.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | |
name
property
¶
Return the adapter name with chaos prefix.
__init__(adapter, overrides, *, seed=42)
¶
Initialize the chaos proxy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
adapter
|
AdapterProtocol
|
The real adapter to wrap. |
required |
overrides
|
list[ChaosOverride]
|
Fault injection rules to apply. |
required |
seed
|
int
|
Random seed for deterministic fault injection. |
42
|
Source code in src/agentprobe/core/chaos.py
invoke(input_text, **kwargs)
async
¶
Invoke the wrapped adapter and inject faults.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_text
|
str
|
Input text to send to the adapter. |
required |
**kwargs
|
Any
|
Additional adapter arguments. |
{}
|
Returns:
| Type | Description |
|---|---|
Trace
|
A modified trace with chaos faults injected. |
Source code in src/agentprobe/core/chaos.py
Snapshot Manager¶
agentprobe.core.snapshot
¶
Snapshot (golden file) management for output comparison testing.
Saves, loads, compares, and updates agent output snapshots stored as JSON files. Supports multi-dimension comparison including tool calls, response structure, key facts, cost, and latency.
SnapshotManager
¶
Manages snapshot files for golden-file testing.
Saves traces as JSON snapshots and compares current traces against saved snapshots across multiple dimensions.
Attributes:
| Name | Type | Description |
|---|---|---|
snapshot_dir |
Directory where snapshot files are stored. |
|
threshold |
Similarity threshold for matching. |
Source code in src/agentprobe/core/snapshot.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 | |
__init__(snapshot_dir='.agentprobe/snapshots', *, threshold=0.8)
¶
Initialize the snapshot manager.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
snapshot_dir
|
str | Path
|
Directory for snapshot storage. |
'.agentprobe/snapshots'
|
threshold
|
float
|
Similarity threshold for a match. |
0.8
|
Source code in src/agentprobe/core/snapshot.py
save(name, trace)
¶
Save a trace as a named snapshot.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Snapshot name. |
required |
trace
|
Trace
|
Trace to save. |
required |
Returns:
| Type | Description |
|---|---|
Path
|
Path to the saved snapshot file. |
Source code in src/agentprobe/core/snapshot.py
load(name)
¶
Load a named snapshot.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Snapshot name. |
required |
Returns:
| Type | Description |
|---|---|
Trace
|
The saved Trace. |
Raises:
| Type | Description |
|---|---|
SnapshotError
|
If the snapshot does not exist. |
Source code in src/agentprobe/core/snapshot.py
exists(name)
¶
list_snapshots()
¶
delete(name)
¶
Delete a named snapshot.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Snapshot name. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if deleted, False if not found. |
Source code in src/agentprobe/core/snapshot.py
compare(name, current)
¶
Compare a current trace against a saved snapshot.
Compares across dimensions: tool_calls, output, token_usage, latency, and metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Snapshot name. |
required |
current
|
Trace
|
Current trace to compare. |
required |
Returns:
| Type | Description |
|---|---|
SnapshotDiff
|
A SnapshotDiff with per-dimension similarity scores. |
Raises:
| Type | Description |
|---|---|
SnapshotError
|
If the snapshot does not exist. |
Source code in src/agentprobe/core/snapshot.py
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 | |