diff --git a/docs/prds/0042-smolmachines-parity-tests.md b/docs/prds/0042-smolmachines-parity-tests.md new file mode 100644 index 0000000..1f10f14 --- /dev/null +++ b/docs/prds/0042-smolmachines-parity-tests.md @@ -0,0 +1,85 @@ +# PRD 0042: smolmachines Cross-Backend Parity Tests + +- **Status:** Draft +- **Author:** didericis-codex +- **Created:** 2026-06-02 +- **Issue:** #139 + +## Summary + +Add tests that prove secrets, forwarded env, resume, and remediation behave +equivalently across Docker and smolmachines backends. The fixes in PRDs +0038–0040 are unverifiable without this coverage. + +## Problem + +The existing unit suite is broad but backend-specific. There are no tests that +run the same scenario against both Docker and smolmachines and assert the +outcomes match. A regression in one backend goes undetected until a live run, +and PRDs 0038–0040 can each pass their own unit tests while the backends still +diverge at the integration boundary. + +## Goals / Success Criteria + +- A parity test suite that covers at least: + - Secret env injection: `?prompt` and `${HOST_VAR}` entries produce the same + guest env on both backends. + - Forwarded env: literal manifest env values reach the guest on both backends. + - Resume: a preserved bottle state dir round-trips correctly on both backends + (relies on PRD 0040 metadata). + - Remediation: capability-block approval routes to the correct backend handler + (relies on PRD 0039 dispatch). +- Each scenario is parameterised so a failure names the backend that regressed. +- Tests run without a live VM or Docker daemon (mock or stub backends). + +## Non-goals + +- No end-to-end agent execution tests. +- No performance or load tests. +- No changes to production code (test-only PRD). + +## Scope + +In scope: + +- New test file(s) under `tests/unit/` for parity scenarios. +- Stub or mock implementations of smolmachines and Docker backends as needed. + +Out of scope: + +- Changes to `bot_bottle/` production code. +- CI infrastructure changes beyond adding the new test file to the discover + invocation. + +## Dependencies + +- PRD 0038 should land before the env parity tests are finalised. +- PRDs 0039 and 0040 should land before the remediation and resume scenarios + are finalised; stubs can be written speculatively beforehand. + +## Design + +Parameterise each scenario over a list of backend factory functions. Each +factory returns a bottle instance wired to a stub subprocess layer. The test +body is backend-agnostic: it calls the same public API, captures the same +observable output, and asserts equality. + +For env scenarios, capture the argv or env-file content passed to the guest +and compare against resolved manifest values. For resume, write metadata with +one backend class and read it back to verify correct selection. For remediation, +assert dispatch selects the per-backend handler. + +## Testing Strategy + +Run as part of the standard unit discover: + +- `python3 -m unittest discover -s tests/unit` + +Or directly: + +- `python3 -m unittest tests.unit.test_backend_parity` + +## Open Questions + +- Should parity tests live under `tests/unit/` (mock-based) or + `tests/integration/` (live infra)? Mock-based is preferred to keep CI simple.