Skip to content

Changelog

  • sigil run [PATHS...] — minimal scenario runner. No .sigil/sigil.toml required. Walks files/directories for *.lua (recursive, lib/ skipped), runs each via the in-process scenario runtime, prints pass/fail + summary. Flags:
    • --filter <SUBSTR> (repeatable, OR’d, substring on path and scenario title)
    • --tag <T> / --exclude-tag <T> (existing scenario tag semantics; exclude always wins)
    • --endpoint <URL> (optional; surfaces a clear error at first HTTP call if unset, so browser-only / client-side scenarios can run without one)
    • Exit codes: 0 all passed, 1 some failed, 2 zero scenarios matched (pytest convention)
  • [browser] headless config + SIGIL_BROWSER_HEADLESS env override — set to false to see the browser window during local development.
  • Clearer sigil install-browser output — distinguishes Reusing cached chrome … on cache hit from Using system chrome … (with --use-system) and Installed chrome … (fresh download).
  • Native browser backend (the 0.22.x cutover default) now functional end-to-end. The CDP bridge was previously stubbed: every sigil.browser.* call returned Err(Shutdown) in <1ms without ever launching Chrome. This release wires JobKind::CdpCall / AwaitEvent / AwaitDownload through the live CDP client, event router, and download tracker; primes a default page session on launch and routes Page.*/Runtime.*/etc. accordingly.
  • NativeBrowserManager::plan_call handles 14 of 17 BrowserCall variants (was 7). Wired: Fill, Wait, Html, Type, Press, Hover, Check, Select, Scroll, Checked, WaitDownload, Cookies, Pdf, Snapshot, Upload.
  • sigil install-browser HTTPS transport wired through asupersync — pinned Chrome-for-Testing zip downloads + verifies + extracts in a single command on a clean host. Distinct error variants for DNS / TLS / connect / partial-body / HTTP-status failures.
  • sigil install-browser cache-hit writes the current pointer so sigil browser doctor doesn’t immediately report the binary as missing after a successful pre-populated install.
  • sigil browser doctor use_system_fallback row reads coherently across all four states (Sigil install + no system, Sigil install + system available, system only, neither).
  • scenario run --deploy retry race — back-to-back invocations no longer fail with “Error deploying service” after killing a stale process. Port-readiness polling after SIGTERM (with SIGKILL escalation after 2s), foreign-pid guard via /proc/net/tcp{,6}.
  • cdp/client integration tests un-#[ignore]d (7 tests). Root cause was a dangling reader task keeping the server-side TCP half open.
  • Workspace clippy gate (cargo clippy --workspace --all-targets --locked -- -D warnings) now exits 0 (was ~213 errors in test code). Added to CI.
  • sigil.browser.upload routes every file path through a per-scenario path sandbox before DOM.setFileInputFiles. Previously caller-supplied paths went straight to Chrome, letting an untrusted scenario attach any file on the host (/etc/passwd, SSH keys, etc.) to a form. The sandbox canonicalizes paths after symlink resolution, rejects escapes (OutsideAllowedRoots), and fail-closes on a missing root. Allowed root is the scenario file’s parent directory, so uploads can only reach fixtures sitting next to the scenario.
  • Native browser only launches when a scenario declares "browser" capability. Previously every scenario eagerly spawned Chrome (twice — once per PR/baseline env) before policy was even parsed. Pure-HTTP scenarios in sigil eval --tag health went from ~282ms/scenario to ~1ms/scenario.
  • Lazy browser-init backstop: even browser-declared scenarios that never actually call the browser don’t pay launch cost.
  • Mis-declared scenarios that call sigil.browser.* without the "browser" capability now fail with a clear policy error instead of silently launching Chrome anyway.
  • [browser] backend default flipped from cli to native. Browser scenarios now run in-process via the sigil-browser crate by default. The CLI backend (agent-browser) shell-out path was removed; backend = "cli" still deserialises (so existing configs do not fail to parse) but every sigil.browser.* call returns a structured “removed” error directing operators to set backend = "native".
  • Scenario DSL:

    • sigil.sleep() primitive for timing control with budget enforcement
    • sigil.expect_status_class() helper for HTTP response class assertions (2xx, 4xx, 5xx)
    • Per-scenario reset hook via [scenario.reset] config
    • Per-call base_url override on HTTP methods
  • Scenario management:

    • scenario promote command for staging → holdout workflow
    • Holdout split support during promotion
    • --seed flag for deterministic scenario generation
    • Tag-based filtering: --tag and --exclude-tag selectors for all scenario commands
    • Scenario-level skip directives with reasons in reports
  • Scenario generation:

    • --filter and --limit flags for scenario generate plan scope
    • Per-case logging in scenario generation
  • Judge system:

    • --judge-model flag to compare judge outputs across different models
    • sigil compare command for side-by-side judge evaluation
  • Configuration:

    • Prompt injection of configured SIGIL_SEED_KEYS into generation
    • Enhanced few-shot examples for spec-to-logs mapping
    • Improved JSON error handling with mode hints
  • Scenario run and dry-run now honor skip directives
  • Promotion correctly handles staging paths
  • Staging-category names no longer leak into scenario tags
  • Judge output now deterministic across runs (fixed parameter settings)
  • Scenario generation CLI:

    • sigil scenario generate orchestrator for end-to-end generation
    • Stage 3 execution validation (opt-in via --verify flag)
    • Scenario-level skip with reason surfacing in eval reports
    • --tag / --exclude-tag selectors for scenario run and scenario dry-run
    • scenario promote subcommand for staging → visible/holdout split
  • Scenario DSL enhancements:

    • --seed flag for deterministic generation
    • --filter and --limit for generation scope control
  • Judge system:

    • --judge-model flag for cross-model comparison
    • sigil compare command
    • Claude-code provider for judge with structured output
  • CI integration:

    • sigil ci command for PR evaluation and GitHub status
    • Config context resolution via frontmatter
  • Browser automation:

    • sigil.browser API: open, click, fill, wait, text, html, title, url, screenshot, eval, cookies, snapshot, visible
    • Session isolation per scenario
  • Agentic intent:

    • sigil.intent() for LLM-driven scenario execution
    • Tool-use with automatic tool descriptors
    • Capture fields for structured data extraction
    • Thinking model support
  • CLI enhancements:

    • sigil keys add-self for key management
    • sigil scenario run --all for batch execution
    • sigil feedback --last for agent dev loop
    • --no-baseline flag for sigil eval
    • --deploy flag for sigil scenario run (self-contained execution)
    • Progress reporting for eval and scenario run
    • Format auto-detection and shell completion
  • Judge parameter settings for deterministic output
  • Judge provider argument handling
  • Scenario CLI log buffer handling
  • Scenario skip directive processing
  • Kubernetes backend:

    • sigil eval now supports Kubernetes deployments via kubectl
    • Configure via [deploy] section in sigil.toml
  • Container backends:

    • Bare container backend for docker run / podman run single-container services
    • Configurable compose command (docker-compose, podman-compose, etc.)
  • Endpoint management:

    • --pr-endpoint flag to evaluate against specific PR endpoint
    • --baseline-endpoint flag to evaluate against specific baseline endpoint
  • Documentation:

    • Comprehensive Scenario DSL user reference
    • CLI help text improvements with examples
  • Deploy backend selection: Configure primary backend in sigil.toml
  • Compose CLI flexibility: Support for podman-compose, docker-compose variants
  • Endpoint control: --pr-endpoint and --baseline-endpoint flags for custom deployments
  • GitHub Actions integration: Deploy and verify via GitHub workflows
  • CLI improvements: Long help text, workflow examples, agent-friendly documentation
  • Attestations: In-toto attestation generation and Ed25519 signing

  • Output formats:

    • JSON format shorthand: --json alias for --format json
    • Format auto-detection: pretty for TTY, text for pipes
    • Shell completion generation
  • Diagnostics:

    • Enhanced sigil doctor with comprehensive prerequisite checks
  • Dashboard: Web UI for eval and trust overview

  • Trust commands:

    • sigil trust show: View current trust state
    • sigil trust history: Review trust transitions
    • sigil trust mode: Check and transition trust levels
  • Eval enhancements:

    • Failure-triggered baseline re-check
    • sigil report: Reconstruct eval reports from ledger
  • Policy hooks: Optional OPA/Rego policy verification

  • Adaptive evaluation: Early termination based on confidence
  • LLM judge: sigil.judge() Lua API for semantic assertions
  • Judge configuration: [judge] section in sigil.toml with provider selection
  • Evaluation: sigil diff for comparing two evaluation results
  • Judge providers: Support for multiple judge backends
  • Replay: sigil replay to re-execute scenarios from recorded artifacts
  • Reporting: sigil report to reconstruct reports from ledger
  • Security gates:
    • Automated secret scanning (trufflehog)
    • Dependency vulnerability scanning (trivy)
    • Static analysis (semgrep) for code quality checks
  • Parallel execution: Concurrent scenario runs for faster evaluation
  • Judge consensus: Quorum voting across multiple judge instances
  • Trust model: Per-service trust scoring
  • Judge fallback: Automatic fallback to secondary model on provider failure
  • GitHub Actions: sigil-action workflow integration
  • Policy engine: sigil decide with threshold-based approval
  • Evaluation reports: JSON eval reports with detailed results
  • Baseline comparison: sigil eval compares PR against baseline
  • Satisfaction scoring: Quantified results vs baseline
  • Scenario execution: sigil eval runs scenarios against deployed environments
  • Type stubs: sigil generate-types for LuaLS IDE support
  • Blob store: Content-addressed artifact storage with integrity verification
  • Scenario runner: sigil scenario run <scenario> for local development
  • HTTP client: sigil.get(), sigil.post(), sigil.put(), sigil.patch(), sigil.delete()
  • CLI runner: sigil.exec() for command execution
  • Project setup: sigil init scaffolds new sigil projects
  • Health checks: sigil doctor validates environment and dependencies
  • Lua API: sigil.* globals: env(), json(), yaml()
  • Scenario DSL:

    • expect(expr) with power assertions
    • invariant(name, opts) for property testing
    • Generators: sigil.gen.string(), sigil.gen.int(), etc.
  • Key management: sigil keys commands for scenario encryption

  • Holdout scenarios: Support for hidden test scenarios

  • Scenario management: sigil scenario list, sigil scenario dry-run

  • Initial public release
  • Core evaluation engine
  • Scenario support with Lua DSL
  • Docker Compose deployment
  • Basic evaluation reporting