Changelog¶
All notable changes to AgentProbe will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.5.0] - 2025-05-15¶
Added¶
- PostgreSQL
load_result(result_id)method for API parity with SQLite - Dedicated migration test suite (
test_migrations.py) TraceDifferfor structural comparison between any two tracesTraceDiffReportmodel with token/latency deltas and similarity scoring- Dashboard CLI command tests (
test_dashboard_cmd.py) - Pricing data for Google (Gemini), Mistral, and Cohere models
FieldEncryptorwith SHA-256 hashing and partial maskingAuditLoggerwith structured JSON event logging- Parametrized tests across 6 test modules for edge case coverage
- PyPI release workflow with trusted publisher
- MkDocs documentation deployment to GitHub Pages
- Dependabot configuration for pip and GitHub Actions
Changed¶
- Development status upgraded from Alpha to Beta
- Version bumped from 0.4.0 to 0.5.0
[0.4.0] - 2025-04-15¶
Added¶
- Dashboard REST API with FastAPI: health, traces, results, and metrics endpoints
agentprobe dashboardCLI command with lazy FastAPI/uvicorn imports- API reference documentation for all 12 top-level modules via mkdocstrings
- Test factories for cost, safety, regression, and trace replay models
- Example scripts for cost management, metrics trending, plugin creation, custom adapters, and dashboard usage
- Plugin system: loader, registry, base classes, and plugin manager
- Metric collection, aggregation, and trend analysis
- PostgreSQL storage backend with asyncpg
- Database migration system (V1: traces/results, V2: metrics)
- CLI commands for metrics (
metrics list,metrics summary) - Integration test suite covering all cross-module pipelines
- Full project documentation with MkDocs Material
- Docker support with multi-stage build
- CONTRIBUTING.md and LICENSE
Changed¶
- Consolidated CI into single workflow with test matrix, artifact uploads, and JUnit reporting
- Expanded
agentprobe.yaml.exampleandinittemplate to cover all 13 config sections - Version bumped from 0.1.0 to 0.4.0 to reflect Phases 1-4 feature completeness
[0.3.0] - 2025-03-15¶
Added¶
- Plugin system with
PluginBaseabstract class and four typed subclasses PluginRegistryfor managing plugin lifecyclePluginLoaderwith entry point and file-based discoveryPluginManagerfor dispatching lifecycle hooks with error isolationMetricCollectorfor stateless metric extraction from traces and resultsMetricAggregatorusing stdlibstatisticsfor summary computationMetricTrendfor trend analysis across multiple runs- Six built-in metric definitions (latency, token count, cost, score, tool calls, error rate)
- PostgreSQL storage backend with connection pooling via asyncpg
- Migration system for schema versioning
- SQLite extended with metrics table
metrics listandmetrics summaryCLI commands- Integration tests for plugin, metrics, and storage pipelines
[0.2.0] - 2025-02-15¶
Added¶
ChaosProxyfor fault injection into agent tool callsSnapshotManagerfor golden-file comparison testingBudgetEnforcerfor per-test and per-suite cost limitsStatisticalEvaluatorfor multi-run score aggregationTraceComparisonEvaluatorfor structural trace diffingConversationRunnerfor multi-turn dialogue testingRegressionDetectorandBaselineManagerfor behavioral diffReplayEngineandTimeTravelfor trace replay and inspection- Six safety test suites: prompt injection, data leakage, jailbreak, role confusion, hallucination, tool abuse
- PII redaction scanner in
security/pii.py - Six reporting formats: terminal, HTML, JUnit XML, JSON, Markdown, CSV
- CLI commands:
baseline,snapshot,cost,safety - Configuration sections:
chaos,snapshot,budget,regression
[0.1.0] - 2025-01-15¶
Added¶
- Core test framework:
TestRunner,TestCase,TestResult,TestRunmodels @scenariodecorator and test discovery system- Assertion builder with fluent API (
expect,expect_tool_calls) TraceRecorderwith async context manager for capturing agent executionTrace,LLMCall,ToolCalldata models (frozen Pydantic)- Configuration loading from
agentprobe.yamlwith${ENV_VAR}interpolation CostCalculatorwith YAML-based pricing data- Rules-based, embedding, and judge evaluators
- SQLite storage backend with WAL mode
- Framework adapters: LangChain, CrewAI, AutoGen, MCP
- Click-based CLI with
initandtestcommands - Custom exception hierarchy