Changelog
Unreleased
Section titled “Unreleased”[0.23.0] — 2026-05-27
Section titled “[0.23.0] — 2026-05-27”sigil run [PATHS...]— minimal scenario runner. No.sigil/sigil.tomlrequired. Walks files/directories for*.lua(recursive,lib/skipped), runs each via the in-process scenario runtime, prints pass/fail + summary. Flags:--filter <SUBSTR>(repeatable, OR’d, substring on path and scenario title)--tag <T>/--exclude-tag <T>(existing scenario tag semantics; exclude always wins)--endpoint <URL>(optional; surfaces a clear error at first HTTP call if unset, so browser-only / client-side scenarios can run without one)- Exit codes: 0 all passed, 1 some failed, 2 zero scenarios matched (pytest convention)
[browser] headlessconfig +SIGIL_BROWSER_HEADLESSenv override — set tofalseto see the browser window during local development.- Clearer
sigil install-browseroutput — distinguishesReusing cached chrome …on cache hit fromUsing system chrome …(with--use-system) andInstalled chrome …(fresh download).
- Native browser backend (the 0.22.x cutover default) now functional end-to-end. The CDP bridge was previously stubbed: every
sigil.browser.*call returnedErr(Shutdown)in <1ms without ever launching Chrome. This release wiresJobKind::CdpCall/AwaitEvent/AwaitDownloadthrough the live CDP client, event router, and download tracker; primes a default page session on launch and routesPage.*/Runtime.*/etc. accordingly. NativeBrowserManager::plan_callhandles 14 of 17BrowserCallvariants (was 7). Wired:Fill,Wait,Html,Type,Press,Hover,Check,Select,Scroll,Checked,WaitDownload,Cookies,Pdf,Snapshot,Upload.sigil install-browserHTTPS transport wired through asupersync — pinned Chrome-for-Testing zip downloads + verifies + extracts in a single command on a clean host. Distinct error variants for DNS / TLS / connect / partial-body / HTTP-status failures.sigil install-browsercache-hit writes thecurrentpointer sosigil browser doctordoesn’t immediately report the binary as missing after a successful pre-populated install.sigil browser doctoruse_system_fallbackrow reads coherently across all four states (Sigil install + no system, Sigil install + system available, system only, neither).scenario run --deployretry race — back-to-back invocations no longer fail with “Error deploying service” after killing a stale process. Port-readiness polling after SIGTERM (with SIGKILL escalation after 2s), foreign-pid guard via/proc/net/tcp{,6}.cdp/clientintegration tests un-#[ignore]d (7 tests). Root cause was a dangling reader task keeping the server-side TCP half open.- Workspace clippy gate (
cargo clippy --workspace --all-targets --locked -- -D warnings) now exits 0 (was ~213 errors in test code). Added to CI.
Security
Section titled “Security”sigil.browser.uploadroutes every file path through a per-scenario path sandbox beforeDOM.setFileInputFiles. Previously caller-supplied paths went straight to Chrome, letting an untrusted scenario attach any file on the host (/etc/passwd, SSH keys, etc.) to a form. The sandbox canonicalizes paths after symlink resolution, rejects escapes (OutsideAllowedRoots), and fail-closes on a missing root. Allowed root is the scenario file’s parent directory, so uploads can only reach fixtures sitting next to the scenario.
Performance
Section titled “Performance”- Native browser only launches when a scenario declares
"browser"capability. Previously every scenario eagerly spawned Chrome (twice — once per PR/baseline env) before policy was even parsed. Pure-HTTP scenarios insigil eval --tag healthwent from ~282ms/scenario to ~1ms/scenario. - Lazy browser-init backstop: even browser-declared scenarios that never actually call the browser don’t pay launch cost.
- Mis-declared scenarios that call
sigil.browser.*without the"browser"capability now fail with a clear policy error instead of silently launching Chrome anyway.
BREAKING (carried from prior cutover)
Section titled “BREAKING (carried from prior cutover)”[browser] backenddefault flipped fromclitonative. Browser scenarios now run in-process via thesigil-browsercrate by default. The CLI backend (agent-browser) shell-out path was removed;backend = "cli"still deserialises (so existing configs do not fail to parse) but everysigil.browser.*call returns a structured “removed” error directing operators to setbackend = "native".
[0.21.0] — 2026-04-30
Section titled “[0.21.0] — 2026-04-30”-
Scenario DSL:
sigil.sleep()primitive for timing control with budget enforcementsigil.expect_status_class()helper for HTTP response class assertions (2xx, 4xx, 5xx)- Per-scenario reset hook via
[scenario.reset]config - Per-call
base_urloverride on HTTP methods
-
Scenario management:
scenario promotecommand for staging → holdout workflow- Holdout split support during promotion
--seedflag for deterministic scenario generation- Tag-based filtering:
--tagand--exclude-tagselectors for all scenario commands - Scenario-level skip directives with reasons in reports
-
Scenario generation:
--filterand--limitflags for scenario generate plan scope- Per-case logging in scenario generation
-
Judge system:
--judge-modelflag to compare judge outputs across different modelssigil comparecommand for side-by-side judge evaluation
-
Configuration:
- Prompt injection of configured
SIGIL_SEED_KEYSinto generation - Enhanced few-shot examples for spec-to-logs mapping
- Improved JSON error handling with mode hints
- Prompt injection of configured
- Scenario
runanddry-runnow honor skip directives - Promotion correctly handles staging paths
- Staging-category names no longer leak into scenario tags
- Judge output now deterministic across runs (fixed parameter settings)
[0.20.1-rc.1] — 2026-04-21
Section titled “[0.20.1-rc.1] — 2026-04-21”-
Scenario generation CLI:
sigil scenario generateorchestrator for end-to-end generation- Stage 3 execution validation (opt-in via
--verifyflag) - Scenario-level skip with reason surfacing in eval reports
--tag/--exclude-tagselectors forscenario runandscenario dry-runscenario promotesubcommand for staging → visible/holdout split
-
Scenario DSL enhancements:
--seedflag for deterministic generation--filterand--limitfor generation scope control
-
Judge system:
--judge-modelflag for cross-model comparisonsigil comparecommand- Claude-code provider for judge with structured output
-
CI integration:
sigil cicommand for PR evaluation and GitHub status- Config context resolution via frontmatter
-
Browser automation:
sigil.browserAPI:open,click,fill,wait,text,html,title,url,screenshot,eval,cookies,snapshot,visible- Session isolation per scenario
-
Agentic intent:
sigil.intent()for LLM-driven scenario execution- Tool-use with automatic tool descriptors
- Capture fields for structured data extraction
- Thinking model support
-
CLI enhancements:
sigil keys add-selffor key managementsigil scenario run --allfor batch executionsigil feedback --lastfor agent dev loop--no-baselineflag forsigil eval--deployflag forsigil scenario run(self-contained execution)- Progress reporting for eval and scenario run
- Format auto-detection and shell completion
- Judge parameter settings for deterministic output
- Judge provider argument handling
- Scenario CLI log buffer handling
- Scenario skip directive processing
[0.20.0] — 2026-03-05
Section titled “[0.20.0] — 2026-03-05”-
Kubernetes backend:
sigil evalnow supports Kubernetes deployments via kubectl- Configure via
[deploy]section in sigil.toml
-
Container backends:
- Bare container backend for
docker run/podman runsingle-container services - Configurable compose command (docker-compose, podman-compose, etc.)
- Bare container backend for
-
Endpoint management:
--pr-endpointflag to evaluate against specific PR endpoint--baseline-endpointflag to evaluate against specific baseline endpoint
-
Documentation:
- Comprehensive Scenario DSL user reference
- CLI help text improvements with examples
[0.19.0] — 2026-03-05
Section titled “[0.19.0] — 2026-03-05”- Deploy backend selection: Configure primary backend in
sigil.toml - Compose CLI flexibility: Support for podman-compose, docker-compose variants
- Endpoint control:
--pr-endpointand--baseline-endpointflags for custom deployments
[0.18.0] — 2026-03-05
Section titled “[0.18.0] — 2026-03-05”- GitHub Actions integration: Deploy and verify via GitHub workflows
- CLI improvements: Long help text, workflow examples, agent-friendly documentation
[0.17.0] — 2026-03-05
Section titled “[0.17.0] — 2026-03-05”-
Attestations: In-toto attestation generation and Ed25519 signing
-
Output formats:
- JSON format shorthand:
--jsonalias for--format json - Format auto-detection: pretty for TTY, text for pipes
- Shell completion generation
- JSON format shorthand:
-
Diagnostics:
- Enhanced
sigil doctorwith comprehensive prerequisite checks
- Enhanced
[0.16.0] — 2026-03-05
Section titled “[0.16.0] — 2026-03-05”-
Dashboard: Web UI for eval and trust overview
-
Trust commands:
sigil trust show: View current trust statesigil trust history: Review trust transitionssigil trust mode: Check and transition trust levels
-
Eval enhancements:
- Failure-triggered baseline re-check
sigil report: Reconstruct eval reports from ledger
-
Policy hooks: Optional OPA/Rego policy verification
[0.15.0] — 2026-03-05
Section titled “[0.15.0] — 2026-03-05”- Adaptive evaluation: Early termination based on confidence
- LLM judge:
sigil.judge()Lua API for semantic assertions - Judge configuration:
[judge]section in sigil.toml with provider selection - Evaluation:
sigil difffor comparing two evaluation results - Judge providers: Support for multiple judge backends
[0.14.0] — 2026-03-05
Section titled “[0.14.0] — 2026-03-05”- Replay:
sigil replayto re-execute scenarios from recorded artifacts - Reporting:
sigil reportto reconstruct reports from ledger
[0.13.0] — 2026-03-05
Section titled “[0.13.0] — 2026-03-05”- Security gates:
- Automated secret scanning (trufflehog)
- Dependency vulnerability scanning (trivy)
- Static analysis (semgrep) for code quality checks
[0.12.0] — 2026-03-05
Section titled “[0.12.0] — 2026-03-05”- Parallel execution: Concurrent scenario runs for faster evaluation
- Judge consensus: Quorum voting across multiple judge instances
[0.11.0] — 2026-03-05
Section titled “[0.11.0] — 2026-03-05”- Trust model: Per-service trust scoring
- Judge fallback: Automatic fallback to secondary model on provider failure
[0.10.0] — 2026-03-05
Section titled “[0.10.0] — 2026-03-05”- GitHub Actions:
sigil-actionworkflow integration
[0.9.0] — 2026-03-05
Section titled “[0.9.0] — 2026-03-05”- Policy engine:
sigil decidewith threshold-based approval
[0.8.0] — 2026-03-05
Section titled “[0.8.0] — 2026-03-05”- Evaluation reports: JSON eval reports with detailed results
[0.7.0] — 2026-03-05
Section titled “[0.7.0] — 2026-03-05”- Baseline comparison:
sigil evalcompares PR against baseline - Satisfaction scoring: Quantified results vs baseline
[0.6.0] — 2026-03-05
Section titled “[0.6.0] — 2026-03-05”- Scenario execution:
sigil evalruns scenarios against deployed environments - Type stubs:
sigil generate-typesfor LuaLS IDE support - Blob store: Content-addressed artifact storage with integrity verification
[0.5.0] — 2026-03-05
Section titled “[0.5.0] — 2026-03-05”- Scenario runner:
sigil scenario run <scenario>for local development - HTTP client:
sigil.get(),sigil.post(),sigil.put(),sigil.patch(),sigil.delete() - CLI runner:
sigil.exec()for command execution
[0.4.0] — 2026-03-05
Section titled “[0.4.0] — 2026-03-05”- Project setup:
sigil initscaffolds new sigil projects - Health checks:
sigil doctorvalidates environment and dependencies - Lua API:
sigil.*globals:env(),json(),yaml()
[0.3.0] — 2026-03-05
Section titled “[0.3.0] — 2026-03-05”-
Scenario DSL:
expect(expr)with power assertionsinvariant(name, opts)for property testing- Generators:
sigil.gen.string(),sigil.gen.int(), etc.
-
Key management:
sigil keyscommands for scenario encryption -
Holdout scenarios: Support for hidden test scenarios
-
Scenario management:
sigil scenario list,sigil scenario dry-run
[0.2.0] — 2026-03-05
Section titled “[0.2.0] — 2026-03-05”- Initial public release
- Core evaluation engine
- Scenario support with Lua DSL
- Docker Compose deployment
- Basic evaluation reporting