# Quick Start Bayesilisk runs locally and uses deterministic scenario data by default. A fixed seed plus the same inputs produces the same report. ## Install Install directly from GitHub: ```sh python3 -m pip install 'git+https://github.com/sashakolpakov/bayesilisk.git' ``` Or clone and install editable for development: ```sh git clone https://github.com/sashakolpakov/bayesilisk.git cd bayesilisk python3 -m pip install -e . ``` From an existing repository checkout: ```sh python3 -m pip install -e . ``` For development work: ```sh python3 -m pip install -e '.[dev]' ``` For browser probing with Microsoft Playwright: ```sh python3 -m pip install -e '.[dev,playwright]' python3 -m playwright install chromium ``` For documentation work: ```sh python3 -m pip install -r docs/requirements.txt ``` ## Run the Verifier ```sh python3 -m bayesilisk --seed 150 --format json python3 -m bayesilisk --seed 150 --format markdown --output /tmp/bayesilisk.md python3 -m bayesilisk --seed 150 --generated-count 16 --format json ``` The installed console entry point is equivalent: ```sh bayesilisk --seed 150 --format json ``` ## Run The MCP Server After installation, start the local stdio MCP server with: ```sh bayesilisk-mcp ``` The module entry point is equivalent when run from a checkout: ```sh python3 -m bayesilisk.mcp_server ``` By default the MCP server writes only MCP JSON-RPC frames on `stdout` and stays quiet on `stderr`. Set `BAYESILISK_MCP_BANNER=1` when running it manually if you want the ASCII startup banner. To use Bayesilisk from Codex, add an MCP server entry to Codex config: ```toml [mcp_servers.bayesilisk] command = "bayesilisk-mcp" args = [] startup_timeout_sec = 60 tool_timeout_sec = 120 ``` For a project-local config inside a Bayesilisk checkout, use an explicit checkout path. An absolute Python path is safest if Codex does not inherit your interactive shell `PATH`. ```toml [mcp_servers.bayesilisk] command = "python3" args = ["-m", "bayesilisk.mcp_server"] cwd = "/absolute/path/to/bayesilisk" startup_timeout_sec = 60 tool_timeout_sec = 120 ``` See {doc}`codex-mcp` for the full Codex connector workflow. ## Run With Context Context is caller-provided JSON. It can include issue text, agent notes, repository facts, Playwright observations, muted fingerprints, confirmed fingerprints, and prior adjustments. ```sh python3 -m bayesilisk --seed 150 --context /tmp/bayesilisk-context.json --format json ``` Only `ready-for-issue` failed findings should be opened automatically: ```sh python3 -m bayesilisk --seed 150 --context /tmp/bayesilisk-context.json --issue-payloads ``` ## Run the Playwright Demo The bundled workflow demo is local-only. It contains twelve synthetic product-like user actions across Travel, Expenses, Billing, HR, Support, and DMS. Some are controls and some intentionally contain stale state, impossible ordering, duplicate submission, feature-flag exposure, tenant-boundary, and role lane failures so Bayesilisk can receive browser evidence without contacting production systems. ```sh bayesilisk-demo bayesilisk-demo --recording python3 tools/playwright_probe.py --demo --output /tmp/bayesilisk-playwright-context.json python3 -m bayesilisk --seed 150 --context /tmp/bayesilisk-playwright-context.json --format markdown ``` `bayesilisk-demo --recording` opens headed Chromium, slows the probe clicks, and holds the browser briefly for screen recording. The transcript explains the trust boundary: Playwright observes, Grassmann routes, the scenario proposer lane is untrusted, generated catalog/attention scenarios expand coverage, and Bayesilisk's deterministic invariants judge. A `breakage.hard-to-find` verdict is still a deterministic invariant failure; the label means it required cross-role, cross-module, stale-state, or unusual workflow context to surface. The transcript also defines `breakage.easy`, `finding.candidate-breakage`, and `control-confirmed`, and it translates status pairs such as `expected=409 observed=200` into the product meaning: a workflow that should reject inconsistent state returned success. The transcript has two parts: a general multi-fixture verifier run, then a hard-to-find drill-down. The drill-down shows a route-matrix failure that is not the first obvious browser symptom; it requires connecting support takeover state, HR document access, route permissions, and module context before the deterministic verifier emits an issue-ready finding. It also shows a seeded sweep order. Changing `--seed` can make the same buried failure appear earlier or later, while remaining reproducible for that seed. The demo rows are synthetic fixtures from `bayesilisk/demo.py::DEMO_PROBES`, not claims about an existing product. To test a real app, expose probe rows in that app and point Playwright at it: ```sh python3 tools/playwright_probe.py --url http://localhost:3000/probe-page \ --output /tmp/bayesilisk-real-context.json python3 -m bayesilisk --seed 150 --context /tmp/bayesilisk-real-context.json --format markdown ``` ## Enable Optional Ollama Layers Embeddings add a plane-similarity signal to Grassmann attention: ```sh BAYESILISK_USE_OLLAMA_EMBEDDINGS=1 \ BAYESILISK_OLLAMA_MODEL=nomic-embed-text \ python3 -m bayesilisk --seed 150 --context /tmp/bayesilisk-playwright-context.json --format json ``` The scenario proposer model suggests extra candidate scenarios. Bayesilisk validates those proposals before they enter the finite-state verifier: ```sh BAYESILISK_USE_OLLAMA_SCENARIO_MODEL=1 \ BAYESILISK_OLLAMA_SCENARIO_MODEL=gemma4:e2b \ python3 -m bayesilisk --seed 150 --context /tmp/bayesilisk-playwright-context.json --format json ``` The same controls are available as explicit CLI flags: ```sh python3 -m bayesilisk --seed 150 --context /tmp/bayesilisk-playwright-context.json \ --enable-embeddings \ --embedding-model nomic-embed-text \ --enable-scenario-proposer \ --scenario-provider ollama \ --scenario-model gemma4:e2b \ --scenario-proposal-limit 3 \ --attention-threshold 0.4 \ --attention-selection-limit 3 \ --ollama-base-url http://localhost:11434 ``` Reports include `effectiveConfiguration`, so a tester can see which attention, embedding, provider, model, proposal-limit, key-presence, and base-URL-class settings were actually used. ## Test ```sh python3 -m pytest ``` GitHub CI deliberately runs the deterministic suite and docs build without Ollama, hosted model APIs, Playwright browsers, or local-only services: ```sh python3 -m pytest -m "not live_playwright and not live_ollama" sphinx-build -b html docs docs/_build/html ``` Live checks are opt-in local verification commands. They are useful before promotion or release work, but they are not required for the deterministic verifier to prove report compatibility: ```sh python3 -m pytest tests/test_live_integrations.py -m live_playwright -rs BAYESILISK_LIVE_OLLAMA=1 python3 -m pytest tests/test_live_integrations.py -m live_ollama -rs ``` ## Build Documentation ```sh sphinx-build -b html docs docs/_build/html ```