Partnership Exploration • Technical Validation • Product Definition

Halibut: Governed AI Operations and Web-Building Infrastructure

A grounded product architecture combining operational intelligence, controlled AI execution, observability integration, and a governed web-building experience. This document defines what is viable with technology available today—and what still requires human specialist validation before production.

No funding request No unverified endorsements Human expert validation required US-first, globally adaptable

1. Product Definition

PRODUCT DIRECTION

Halibut Operational Intelligence

A control and evidence layer that receives a task, evaluates identity and policy, issues narrowly scoped authority, supervises execution, and records the resulting decisions and outcomes.

Primary outcome: AI-assisted work becomes bounded, inspectable, interruptible and attributable rather than operating as an unrestricted tool call.

PRODUCT MODE TO VALIDATE

Halibut Governed Web Builder

A visual and conversational web-building interface that translates an approved intent into a versioned project, executes generation and tests in isolated runners, previews the result, and requires policy checks before deployment.

Primary outcome: users gain Wix-like accessibility while Halibut adds controlled execution, evidence, change review and deployment governance.

Conclusive product boundary: Halibut is not proposed as a replacement for every observability platform or website platform. Its defensible role is the governed operational layer connecting intent, authority, execution, evidence and release.

2. Grounded Reality Check

StatementAssessmentProduction interpretation
Short-lived signed capability tokens can constrain agent actions.TECHNICALLY VIABLERequires explicit scopes, audience, expiry, request binding and deny-by-default verification. Token format alone does not provide authorization correctness.
PASETO v4.public can use Ed25519 and AWS KMS now supports Ed25519 signing keys.VIABLE, INTEGRATION SPIKEThe PASETO signing input must be constructed exactly to specification and signed through KMS. Existing libraries may assume direct access to a private key, so interoperability and test vectors must be validated.
Cloudflare Workers can serve as unrestricted build/test pods.DO NOT CLAIMWorkers have CPU, memory and runtime constraints. Use isolated containers or Kubernetes jobs for general code generation/build/test. Workers may serve edge APIs, lightweight policy checks or orchestration.
Opik is a security information and event management system.DO NOT CLAIMOpik is documented for LLM/agent observability, evaluation and tracing. Security decisions should also flow to a security log platform or SIEM.
Halibut is SOC 2 compliant after implementing controls.DO NOT CLAIMControls can be designed to support a future SOC 2 examination. Only an independent qualified CPA firm can issue a SOC 2 report.
Fixed latency, cost, throughput or four-month audit outcomes are guaranteed.REMOVEThese depend on workload, regions, vendors, system boundary and operating maturity. They must be measured in the prototype and pilot.

3. High-Level Architecture — Technical Handoff

Experience Plane
Operator ConsoleWorkflows, approvals
Web BuilderPrompt + visual editor
Partner APIsExternal systems
Halibut Control Plane
AdmissionIdentity + request
Policy EngineRBAC/ABAC + rules
Authority ServiceShort-lived token
GateVerify + enforce
Execution & Evidence Plane
Isolated RunnerGenerate/build/test
Artifact StoreVersions + outputs
Decision LedgerAppend-only events
Telemetry RouterOTel / alerts

Reference responsibilities

  • Admission: authenticates caller, resolves tenant, task and requested action.
  • Policy: evaluates roles, attributes, project state, environment and risk constraints.
  • Authority: creates short-lived, audience-bound authority with a key identifier.
  • Gate: verifies locally where practical; checks policy version, request identifier and limits.
  • Runner: executes inside an isolated environment with constrained network, filesystem, CPU and time.
  • Ledger: records immutable decision facts; a stronger tamper-evident design may add hash chaining and WORM retention.

Recommended technology choices—not mandates

  • Go, Rust, Java or TypeScript services based on team capability; avoid unnecessary polyglot complexity in the first version.
  • OPA/Rego or an equivalent policy engine after a policy-model spike.
  • PostgreSQL for transactional state; Redis only where atomic short-lived state adds value.
  • Container-based runners for generated code; Kubernetes is optional until scale or isolation requirements justify it.
  • OpenTelemetry as vendor-neutral instrumentation; Opik for AI traces; a separate security-log destination for alerts and investigations.
  • AWS KMS, GCP Cloud KMS or another managed HSM-backed signer selected after deployment-region and compliance review.

4. Halibut Intelligence in the Operational Flow

1. IntentHuman or AI asks
2. InterpretTask manifest
3. AuthorizePolicy decision
4. ExecuteBounded runner
5. ObserveTrace + metrics
6. DecideRelease / stop

Operational state

Halibut maintains the task, project, environment, approvals, policy version and execution result as explicit state—not hidden inside a conversation.

Decision intelligence

The first production version should use deterministic rules and measured signals. AI may summarize or recommend; it should not silently override authorization.

Human control

High-impact operations—production deployment, secret access, data export and policy modification—should require stronger authentication or explicit approval.

5. Halibut as a Marketable Governed Web Builder

Viable first product experience

  1. User describes a website or selects a structured template.
  2. Halibut creates a project manifest: pages, components, data needs, integrations and constraints.
  3. An AI model proposes code and content inside an isolated workspace.
  4. Automated checks run: dependency policy, secret scan, lint, unit tests, accessibility and build.
  5. The user previews a versioned artifact and reviews changes.
  6. Halibut Gate evaluates deployment authority and environment policy.
  7. Approved artifacts deploy through a supported adapter; evidence is recorded.

Do not attempt in the first release

  • Full parity with mature visual editors.
  • Arbitrary untrusted plugins without a sandbox and review model.
  • Automatic production deployment without explicit controls.
  • Claims of universal framework or cloud support.
  • Compliance guarantees embedded in marketing language.
Product-grade scope: begin with a constrained component system and one deployable stack. A smaller deterministic builder with strong governance is more credible than a broad generator that cannot reliably reproduce, test or control its output.

6. Parallel Positioning: Halibut, Opik and Wix

Halibut

Proposed scope: governed authority, execution, evidence and controlled web generation.

  • Admission and policy
  • Scoped execution authority
  • Isolated build/test
  • Decision ledger
  • Approval and release gates

Opik

Documented scope: open-source observability, debugging, evaluation and optimization for LLM applications and agents.

  • Traces and spans
  • LLM evaluation
  • Experiment visibility
  • OpenTelemetry integration

Halibut can send AI traces to Opik. It should not duplicate Opik unless a product requirement demands native trace analysis.

Wix

Documented scope: visual/AI-assisted site creation plus a JavaScript development platform, frontend/backend APIs and external integrations.

  • Site creation experience
  • Visual editing
  • Velo / JavaScript SDK
  • Backend functions and APIs

Halibut can learn from the accessibility model. It should not claim that Wix lacks security; the differentiation is task-level AI execution governance and portable evidence.

The missing-layer thesis, stated carefully: Based on their published product scopes, Opik focuses on observing AI behavior and Wix focuses on building and operating sites. Halibut can occupy the complementary layer that decides what an AI-assisted operation is allowed to do, executes it within bounded infrastructure, and records evidence across the workflow.

7. The Halibut Model Was First Tested as a Human-Orchestrated Workflow

Before Halibut existed as software, its core operating logic was exercised manually during AI-assisted product and infrastructure work. In that workflow, Isha acted as the human Halibut layer: interpreting objectives, controlling which AI handled each task, observing model reasoning and outputs, stopping or redirecting work when needed, and approving movement to the next phase.

Business IntentGoal, constraints, quality bar
Human HalibutIsha routes and governs
Architecture PartnerChallenge, sequence, guardrails
ClaudeImplementation and analysis
Human ReviewInspect reasoning and output
Approved ResultBuild, test, continue or stop

What Isha performed

  • Translated business intent into narrow operational instructions.
  • Selected the appropriate AI model according to task strength and risk.
  • Observed model reasoning and detected possible drift before code creation.
  • Controlled handoffs between planning, implementation, testing and approval.
  • Protected cost, quality and sequence by preventing unnecessary reruns.

What Claude performed

  • Executed defined implementation and analysis tasks.
  • Produced code, documentation and technical reasoning under human direction.
  • Responded to corrections and architecture constraints before committing work.
  • Exposed intermediate reasoning signals that supported human supervision.

What the architecture partner performed

  • Helped decompose complex tasks before implementation.
  • Identified when a stronger model or different sequence was appropriate.
  • Challenged assumptions and clarified acceptance criteria.
  • Supported the human operator in deciding when Claude should proceed, pause or revise.
Observed result: this collaboration demonstrated the practical value of a governed orchestration layer between human intent and AI execution. It showed that task routing, visible reasoning, staged approvals and deliberate handoffs can reduce avoidable rework and improve implementation discipline. It does not prove production security, autonomous reliability or regulatory compliance; those still require software controls, measured testing and independent human expert review.

Manual behavior → Halibut product function

Manual workflow behaviorHalibut product equivalentRequired validation
Isha decided which model should perform a task.Model and tool routing policy based on task type, sensitivity, cost and capability.AI systems expert validates routing criteria and fallback behavior.
Isha narrowed the task before Claude generated code.Structured task manifest defining allowed actions, files, limits and acceptance criteria.Security architect validates scope semantics and deny-by-default behavior.
Isha watched reasoning and redirected drift.Execution trace, checkpoints, policy events and human-intervention controls.AI expert validates what signals are reliable enough for supervision.
Work moved through deliberate phases.State machine for plan, approve, execute, test, review and release.Platform engineer validates durability, retries, idempotency and recovery.
Outputs were built and tested before acceptance.Isolated runner, automated checks, artifact versioning and release gate.AppSec and SRE validate isolation, dependency controls and operational resilience.
How Claude should be represented: Claude contributed directly to the manual prototype through implementation, technical reasoning and responsiveness to human guidance. This is meaningful product-discovery evidence. It must not be presented as an Anthropic endorsement, independent certification or completed specialist security review unless such a review is formally commissioned and documented.

8. Human and Specialized AI Expert Validation

ValidatorRequired decisionEvidence expected
Principal Security ArchitectThreat model, trust boundaries, token choice, key custody, revocation, runner isolation and fail-closed behavior.Signed architecture review, misuse cases, security acceptance criteria and unresolved-risk register.
Senior Platform / SRE EngineerDeployment topology, capacity model, queues, databases, disaster recovery, observability and operational runbooks.Infrastructure prototype, load-test results, recovery exercise and cost model from measured usage.
Application Security EngineerPath traversal, dependency attacks, prompt-to-tool escalation, token replay, confused deputy and cross-tenant isolation.Independent review report, penetration tests and remediation verification.
Specialized AI Systems ExpertAgent architecture, model/tool contract, evaluation set, hallucination containment, fallback and human approval placement.Evaluation protocol, failure taxonomy, model-agnostic interfaces and measured task reliability.
GRC / Privacy CounselSystem boundary, data classification, vendor/subprocessor obligations, regional requirements and control mapping.Control matrix, data-flow inventory, retention schedule and policy set.
Claude / other frontier model reviewUse as an analysis aid to critique manifests, policies, failure cases and developer documentation.Recorded prompts and findings reviewed by a human expert. Model output is not an endorsement, certification or security sign-off.

9. Careful but Fast Validation Timeline

These are planning ranges, not delivery guarantees. The sequence is designed for a fast-moving AI market while preventing architecture assumptions from becoming production liabilities.

Phase 0
2–3 weeks

Definition and threat model

Confirm use cases, system boundary, data classes, trust boundaries, token alternatives and validation team.

EXIT: ARCHITECTURE DECISION

Phase 1
4–6 weeks

Vertical technical prototype

One task flow from request → policy → token → isolated execution → evidence → preview.

EXIT: MEASURED POC

Phase 2
6–10 weeks

Controlled design partner pilot

One builder stack, limited integrations, human approvals, operational dashboards and incident exercises.

EXIT: PILOT REVIEW

Phase 3
8–16+ weeks

Production hardening

Independent AppSec, tenancy isolation, DR, SLOs, privacy controls, support model and evidence readiness.

EXIT: GO / NO-GO

Work can overlap where risk permits, but security architecture, execution isolation and data handling must not be deferred until after external use.

10. Product Prototype — Governed Builder Console

HALIBUT / PROJECT: CUSTOMER PORTALEnvironment: Preview • Policy v0.3
LIVE PREVIEW — artifact #hbt-0241

Customer Operations Portal

Track requests, approvals and delivery status.

Submit a request
Open requests

12

Pending approval

3

Execution decision
ALLOW PREVIEW
Scope: project/hbt-0241
Max files: 24
Network: approved registries only
Expires: 8 minutes
Checks
✓ Build passed
✓ Secret scan passed
✓ Dependency policy passed
! Accessibility review required
Release

Production deploy requires owner approval and a fresh authority token.

This visual is an interface concept, not a functioning security implementation. It demonstrates how creation, validation, authority and evidence can be made visible in one workflow.

11. Product-Grade Feasibility Conclusion

Why Halibut is possible now

  • Managed asymmetric signing and downloadable public keys are available.
  • Open policy engines, container isolation, CI runners and transactional databases are mature.
  • OpenTelemetry provides vendor-neutral traces and metrics.
  • Opik offers an integration destination for AI/agent observability.
  • Modern site platforms demonstrate that visual and AI-assisted building experiences are technically and commercially understood.

What determines whether it becomes real

  • A narrow first use case with a repeatable outcome.
  • Human validation of security, AI behavior and operations.
  • Measured evidence instead of forecasted performance claims.
  • A controlled runner and explicit policy model.
  • A design-partner pilot that tests usability and willingness to adopt—not only technical completion.
Final position: Halibut is technically feasible as a governed AI operations layer and as a constrained, governed web builder using technology available today. Production viability is not yet proven; it must be earned through the specified architecture review, vertical prototype, controlled pilot and independent security validation.

12. Grounding Sources — Official Documentation

PASETO v4.public: the official specification defines Ed25519 signing and verification.

PASETO Version 4 specification

AWS KMS: current documentation describes Ed25519 key support, private-key isolation, signing and external verification using a downloaded public key.

AWS KMS key specifications
AWS KMS Sign API

Cloudflare Workers: official limits show runtime constraints; this supports using Workers selectively rather than presenting them as unrestricted build containers.

Cloudflare Workers limits

Opik: official documentation defines the product as LLM/agent observability, evaluation and optimization, with OpenTelemetry support.

Opik documentation
Opik OpenTelemetry integration

Wix: official documentation confirms visual/AI-assisted site creation, frontend/backend development, APIs and external integrations.

Wix Velo documentation
Wix Harmony overview
Wix site backend

SOC: AICPA describes SOC as CPA-provided examination and reporting services, supporting the distinction between readiness work and an issued SOC report.

AICPA SOC suite
AICPA Trust Services Criteria