A grounded product architecture combining operational intelligence, controlled AI execution, observability integration, and a governed web-building experience. This document defines what is viable with technology available today—and what still requires human specialist validation before production.
A control and evidence layer that receives a task, evaluates identity and policy, issues narrowly scoped authority, supervises execution, and records the resulting decisions and outcomes.
Primary outcome: AI-assisted work becomes bounded, inspectable, interruptible and attributable rather than operating as an unrestricted tool call.
A visual and conversational web-building interface that translates an approved intent into a versioned project, executes generation and tests in isolated runners, previews the result, and requires policy checks before deployment.
Primary outcome: users gain Wix-like accessibility while Halibut adds controlled execution, evidence, change review and deployment governance.
| Statement | Assessment | Production interpretation |
|---|---|---|
| Short-lived signed capability tokens can constrain agent actions. | TECHNICALLY VIABLE | Requires explicit scopes, audience, expiry, request binding and deny-by-default verification. Token format alone does not provide authorization correctness. |
| PASETO v4.public can use Ed25519 and AWS KMS now supports Ed25519 signing keys. | VIABLE, INTEGRATION SPIKE | The PASETO signing input must be constructed exactly to specification and signed through KMS. Existing libraries may assume direct access to a private key, so interoperability and test vectors must be validated. |
| Cloudflare Workers can serve as unrestricted build/test pods. | DO NOT CLAIM | Workers have CPU, memory and runtime constraints. Use isolated containers or Kubernetes jobs for general code generation/build/test. Workers may serve edge APIs, lightweight policy checks or orchestration. |
| Opik is a security information and event management system. | DO NOT CLAIM | Opik is documented for LLM/agent observability, evaluation and tracing. Security decisions should also flow to a security log platform or SIEM. |
| Halibut is SOC 2 compliant after implementing controls. | DO NOT CLAIM | Controls can be designed to support a future SOC 2 examination. Only an independent qualified CPA firm can issue a SOC 2 report. |
| Fixed latency, cost, throughput or four-month audit outcomes are guaranteed. | REMOVE | These depend on workload, regions, vendors, system boundary and operating maturity. They must be measured in the prototype and pilot. |
Halibut maintains the task, project, environment, approvals, policy version and execution result as explicit state—not hidden inside a conversation.
The first production version should use deterministic rules and measured signals. AI may summarize or recommend; it should not silently override authorization.
High-impact operations—production deployment, secret access, data export and policy modification—should require stronger authentication or explicit approval.
Proposed scope: governed authority, execution, evidence and controlled web generation.
Documented scope: open-source observability, debugging, evaluation and optimization for LLM applications and agents.
Halibut can send AI traces to Opik. It should not duplicate Opik unless a product requirement demands native trace analysis.
Documented scope: visual/AI-assisted site creation plus a JavaScript development platform, frontend/backend APIs and external integrations.
Halibut can learn from the accessibility model. It should not claim that Wix lacks security; the differentiation is task-level AI execution governance and portable evidence.
Before Halibut existed as software, its core operating logic was exercised manually during AI-assisted product and infrastructure work. In that workflow, Isha acted as the human Halibut layer: interpreting objectives, controlling which AI handled each task, observing model reasoning and outputs, stopping or redirecting work when needed, and approving movement to the next phase.
| Manual workflow behavior | Halibut product equivalent | Required validation |
|---|---|---|
| Isha decided which model should perform a task. | Model and tool routing policy based on task type, sensitivity, cost and capability. | AI systems expert validates routing criteria and fallback behavior. |
| Isha narrowed the task before Claude generated code. | Structured task manifest defining allowed actions, files, limits and acceptance criteria. | Security architect validates scope semantics and deny-by-default behavior. |
| Isha watched reasoning and redirected drift. | Execution trace, checkpoints, policy events and human-intervention controls. | AI expert validates what signals are reliable enough for supervision. |
| Work moved through deliberate phases. | State machine for plan, approve, execute, test, review and release. | Platform engineer validates durability, retries, idempotency and recovery. |
| Outputs were built and tested before acceptance. | Isolated runner, automated checks, artifact versioning and release gate. | AppSec and SRE validate isolation, dependency controls and operational resilience. |
| Validator | Required decision | Evidence expected |
|---|---|---|
| Principal Security Architect | Threat model, trust boundaries, token choice, key custody, revocation, runner isolation and fail-closed behavior. | Signed architecture review, misuse cases, security acceptance criteria and unresolved-risk register. |
| Senior Platform / SRE Engineer | Deployment topology, capacity model, queues, databases, disaster recovery, observability and operational runbooks. | Infrastructure prototype, load-test results, recovery exercise and cost model from measured usage. |
| Application Security Engineer | Path traversal, dependency attacks, prompt-to-tool escalation, token replay, confused deputy and cross-tenant isolation. | Independent review report, penetration tests and remediation verification. |
| Specialized AI Systems Expert | Agent architecture, model/tool contract, evaluation set, hallucination containment, fallback and human approval placement. | Evaluation protocol, failure taxonomy, model-agnostic interfaces and measured task reliability. |
| GRC / Privacy Counsel | System boundary, data classification, vendor/subprocessor obligations, regional requirements and control mapping. | Control matrix, data-flow inventory, retention schedule and policy set. |
| Claude / other frontier model review | Use as an analysis aid to critique manifests, policies, failure cases and developer documentation. | Recorded prompts and findings reviewed by a human expert. Model output is not an endorsement, certification or security sign-off. |
These are planning ranges, not delivery guarantees. The sequence is designed for a fast-moving AI market while preventing architecture assumptions from becoming production liabilities.
Definition and threat model
Confirm use cases, system boundary, data classes, trust boundaries, token alternatives and validation team.
EXIT: ARCHITECTURE DECISIONVertical technical prototype
One task flow from request → policy → token → isolated execution → evidence → preview.
EXIT: MEASURED POCControlled design partner pilot
One builder stack, limited integrations, human approvals, operational dashboards and incident exercises.
EXIT: PILOT REVIEWProduction hardening
Independent AppSec, tenancy isolation, DR, SLOs, privacy controls, support model and evidence readiness.
EXIT: GO / NO-GOWork can overlap where risk permits, but security architecture, execution isolation and data handling must not be deferred until after external use.
Track requests, approvals and delivery status.
Submit a requestProduction deploy requires owner approval and a fresh authority token.
This visual is an interface concept, not a functioning security implementation. It demonstrates how creation, validation, authority and evidence can be made visible in one workflow.
PASETO v4.public: the official specification defines Ed25519 signing and verification.
PASETO Version 4 specification
AWS KMS: current documentation describes Ed25519 key support, private-key isolation, signing and external verification using a downloaded public key.
AWS KMS key specifications
AWS KMS Sign API
Cloudflare Workers: official limits show runtime constraints; this supports using Workers selectively rather than presenting them as unrestricted build containers.
Opik: official documentation defines the product as LLM/agent observability, evaluation and optimization, with OpenTelemetry support.
Opik documentation
Opik OpenTelemetry integration
Wix: official documentation confirms visual/AI-assisted site creation, frontend/backend development, APIs and external integrations.
Wix Velo documentation
Wix Harmony overview
Wix site backend
SOC: AICPA describes SOC as CPA-provided examination and reporting services, supporting the distinction between readiness work and an issued SOC report.