Quick Start¶

Bayesilisk runs locally and uses deterministic scenario data by default. A fixed seed plus the same inputs produces the same report.

Install¶

Install directly from GitHub:

python3 -m pip install 'git+https://github.com/sashakolpakov/bayesilisk.git'

Or clone and install editable for development:

git clone https://github.com/sashakolpakov/bayesilisk.git
cd bayesilisk
python3 -m pip install -e .

From an existing repository checkout:

python3 -m pip install -e .

For development work:

python3 -m pip install -e '.[dev]'

For browser probing with Microsoft Playwright:

python3 -m pip install -e '.[dev,playwright]'
python3 -m playwright install chromium

For documentation work:

python3 -m pip install -r docs/requirements.txt

Run the Verifier¶

python3 -m bayesilisk --seed 150 --format json
python3 -m bayesilisk --seed 150 --format markdown --output /tmp/bayesilisk.md
python3 -m bayesilisk --seed 150 --generated-count 16 --format json

The installed console entry point is equivalent:

bayesilisk --seed 150 --format json

Run The MCP Server¶

After installation, start the local stdio MCP server with:

bayesilisk-mcp

The module entry point is equivalent when run from a checkout:

python3 -m bayesilisk.mcp_server

By default the MCP server writes only MCP JSON-RPC frames on stdout and stays quiet on stderr. Set BAYESILISK_MCP_BANNER=1 when running it manually if you want the ASCII startup banner.

To use Bayesilisk from Codex, add an MCP server entry to Codex config:

[mcp_servers.bayesilisk]
command = "bayesilisk-mcp"
args = []
startup_timeout_sec = 60
tool_timeout_sec = 120

For a project-local config inside a Bayesilisk checkout, use an explicit checkout path. An absolute Python path is safest if Codex does not inherit your interactive shell PATH.

[mcp_servers.bayesilisk]
command = "python3"
args = ["-m", "bayesilisk.mcp_server"]
cwd = "/absolute/path/to/bayesilisk"
startup_timeout_sec = 60
tool_timeout_sec = 120

See Codex MCP for the full Codex connector workflow.

Run With Context¶

Context is caller-provided JSON. It can include issue text, agent notes, repository facts, Playwright observations, muted fingerprints, confirmed fingerprints, and prior adjustments.

python3 -m bayesilisk --seed 150 --context /tmp/bayesilisk-context.json --format json

Only ready-for-issue failed findings should be opened automatically:

python3 -m bayesilisk --seed 150 --context /tmp/bayesilisk-context.json --issue-payloads

Run the Playwright Demo¶

The bundled workflow demo is local-only. It contains twelve synthetic product-like user actions across Travel, Expenses, Billing, HR, Support, and DMS. Some are controls and some intentionally contain stale state, impossible ordering, duplicate submission, feature-flag exposure, tenant-boundary, and role lane failures so Bayesilisk can receive browser evidence without contacting production systems.

bayesilisk-demo
bayesilisk-demo --recording
python3 tools/playwright_probe.py --demo --output /tmp/bayesilisk-playwright-context.json
python3 -m bayesilisk --seed 150 --context /tmp/bayesilisk-playwright-context.json --format markdown

bayesilisk-demo --recording opens headed Chromium, slows the probe clicks, and holds the browser briefly for screen recording. The transcript explains the trust boundary: Playwright observes, Grassmann routes, the scenario proposer lane is untrusted, generated catalog/attention scenarios expand coverage, and Bayesilisk’s deterministic invariants judge. A breakage.hard-to-find verdict is still a deterministic invariant failure; the label means it required cross-role, cross-module, stale-state, or unusual workflow context to surface. The transcript also defines breakage.easy, finding.candidate-breakage, and control-confirmed, and it translates status pairs such as expected=409 observed=200 into the product meaning: a workflow that should reject inconsistent state returned success.

The transcript has two parts: a general multi-fixture verifier run, then a hard-to-find drill-down. The drill-down shows a route-matrix failure that is not the first obvious browser symptom; it requires connecting support takeover state, HR document access, route permissions, and module context before the deterministic verifier emits an issue-ready finding. It also shows a seeded sweep order. Changing --seed can make the same buried failure appear earlier or later, while remaining reproducible for that seed.

The demo rows are synthetic fixtures from bayesilisk/demo.py::DEMO_PROBES, not claims about an existing product. To test a real app, expose probe rows in that app and point Playwright at it:

python3 tools/playwright_probe.py --url http://localhost:3000/probe-page \
  --output /tmp/bayesilisk-real-context.json
python3 -m bayesilisk --seed 150 --context /tmp/bayesilisk-real-context.json --format markdown

Enable Optional Ollama Layers¶

Embeddings add a plane-similarity signal to Grassmann attention:

BAYESILISK_USE_OLLAMA_EMBEDDINGS=1 \
BAYESILISK_OLLAMA_MODEL=nomic-embed-text \
python3 -m bayesilisk --seed 150 --context /tmp/bayesilisk-playwright-context.json --format json

The scenario proposer model suggests extra candidate scenarios. Bayesilisk validates those proposals before they enter the finite-state verifier:

BAYESILISK_USE_OLLAMA_SCENARIO_MODEL=1 \
BAYESILISK_OLLAMA_SCENARIO_MODEL=gemma4:e2b \
python3 -m bayesilisk --seed 150 --context /tmp/bayesilisk-playwright-context.json --format json

The same controls are available as explicit CLI flags:

python3 -m bayesilisk --seed 150 --context /tmp/bayesilisk-playwright-context.json \
  --enable-embeddings \
  --embedding-model nomic-embed-text \
  --enable-scenario-proposer \
  --scenario-provider ollama \
  --scenario-model gemma4:e2b \
  --scenario-proposal-limit 3 \
  --attention-threshold 0.4 \
  --attention-selection-limit 3 \
  --ollama-base-url http://localhost:11434

Reports include effectiveConfiguration, so a tester can see which attention, embedding, provider, model, proposal-limit, key-presence, and base-URL-class settings were actually used.

Test¶

python3 -m pytest

GitHub CI deliberately runs the deterministic suite and docs build without Ollama, hosted model APIs, Playwright browsers, or local-only services:

python3 -m pytest -m "not live_playwright and not live_ollama"
sphinx-build -b html docs docs/_build/html

Live checks are opt-in local verification commands. They are useful before promotion or release work, but they are not required for the deterministic verifier to prove report compatibility:

python3 -m pytest tests/test_live_integrations.py -m live_playwright -rs
BAYESILISK_LIVE_OLLAMA=1 python3 -m pytest tests/test_live_integrations.py -m live_ollama -rs

Build Documentation¶

sphinx-build -b html docs docs/_build/html