Architecture

Overview

The Blink Server acts as the control plane for AI agent deployments. Key architectural principles:

Agents are HTTP servers deployed as Docker containers
The control loop runs inside the server, not in agents
Communication is HTTP; chat streaming uses SSE and WebSocket to clients
State is centralized in PostgreSQL (chats, runs, deployments, files, logs, traces, KV)

The server also hosts the web UI and the HTTP API used by the CLI and SDK.

Server Components

API server handles chats, agents, webhooks, files, logs, traces, and devhook routing
WebSocket server is used for chat streaming and auth token handshakes
Startup runs database migrations before accepting traffic

Agent Execution Model

Agents are deployed as Docker containers using a configurable image (default: ghcr.io/coder/blink-agent:latest).

Container Structure

Each agent container includes:

Component	Purpose
Agent bundle	Built files staged into `/app` from the deployment output
Runtime wrapper	Starts the agent and internal API server, proxies requests, injects auth
Internal API server	Serves `/kv`, `/chat`, and `/otlp/v1/traces` for agent code and forwards to the Blink Server
OpenTelemetry Collector	Collects agent logs and forwards them to the server

Runtime Wiring (self-hosted)

On deployment the server:

Downloads deployment output files, writes them to a temp dir, and adds a runtime wrapper (__wrapper.js).
Launches a container and sets environment variables like ENTRYPOINT, PORT, INTERNAL_BLINK_API_SERVER_URL, INTERNAL_BLINK_API_SERVER_LISTEN_PORT, BLINK_REQUEST_URL, BLINK_REQUEST_ID, and BLINK_DEPLOYMENT_TOKEN.
The wrapper starts an internal API server inside the container and patches fetch so internal API calls include x-blink-internal-auth.
The wrapper runs the agent entrypoint on PORT+1 and proxies incoming requests on PORT to the agent.
The OpenTelemetry collector starts and reads the agent log pipe.

Control Loop

The control loop is the core orchestration mechanism. It runs inside the server, not in agents.

Request Flow

External event arrives (API call, Slack message, GitHub webhook)
Server routes the event to the appropriate agent deployment
Server invokes the agent’s /_agent/chat endpoint with an invocation token
Agent processes the request and streams a response back (SSE)
Server persists messages and run/step state to PostgreSQL and fans out to clients

Chat Run Lifecycle

Each chat run has one or more steps stored in the DB.
The server selects the latest step, invokes the active deployment, and streams chunks as they arrive.
If the response includes tool calls, the server creates a new step and continues the loop.
Interrupts cancel an in-flight step and restart with the latest state.

Streaming and Buffering

The server broadcasts message.chunk.added events to WebSocket and SSE clients.
The current streaming buffer is kept in memory to allow reconnects.
This in-memory session state is the main blocker for horizontal scaling today.

Chat Run Sequence

Why the Control Loop is Server-Side

Running the control loop in the server rather than agents provides:

Centralized state in PostgreSQL
Agent simplicity (no orchestration logic)
Observability and auditability
Consistent tool-call looping behavior

For more details about the control loop, see the agent structure guide.

Request Routing

For details on webhook routing and devhooks, see the webhooks and devhooks guide.

Communication

Server -> Agent

The server communicates with agents via HTTP:

Endpoint	Method	Purpose
`/_agent/health`	GET	Health check
`/_agent/chat`	POST	Chat request, SSE response
`/_agent/capabilities`	GET	Check supported handlers
`/_agent/ui`	GET	UI schema for dynamic inputs
`/_agent/flush-otel`	POST	Flush telemetry buffers
`/_agent/*`	ANY	Custom request handler

Older deployments may still be called via /sendMessages or /_agent/send-messages. All server -> agent calls include x-blink-invocation-token. Chat runs also include run, step, and chat ID headers.

Agent -> Server

Agents do not call the public API directly in containers. Instead, the wrapper exposes an internal API server:

/kv for agent key-value storage
/chat for chat CRUD and message operations
/otlp/v1/traces for trace export (logs are forwarded by the collector)

The wrapper forwards these to the Blink Server using the invocation token and the deployment token.

Data and Storage

PostgreSQL stores:

chat messages, runs, and steps
agents, deployments, and deployment targets
files and attachments
logs and traces (self-hosted)

Migrations run automatically at server startup.

Limitations

Current architectural constraints to be aware of:

Limitation	Details
Single node only	In-memory chat streaming buffers prevent horizontal scaling
Docker required	Agents must run as Docker containers (no Kubernetes, ECS, etc.)
Local Docker daemon	Server must have direct access to Docker socket

These limitations exist because Blink is in early access. We plan to support horizontal scaling and other deployment options in the future.

Get Started

Server

Blink Essentials

Agent Use Cases

Overview

Server Components

Agent Execution Model

Container Structure

Runtime Wiring (self-hosted)

Control Loop

Request Flow

Chat Run Lifecycle

Streaming and Buffering

Chat Run Sequence

Why the Control Loop is Server-Side

Request Routing

Communication

Server -> Agent

Agent -> Server

Data and Storage

Limitations

Get Started

Server

Blink Essentials

Agent Use Cases

Documentation Index

​Overview

​Server Components

​Agent Execution Model

​Container Structure

​Runtime Wiring (self-hosted)

​Control Loop

​Request Flow

​Chat Run Lifecycle

​Streaming and Buffering

​Chat Run Sequence

​Why the Control Loop is Server-Side

​Request Routing

​Communication

​Server -> Agent

​Agent -> Server

​Data and Storage

​Limitations

Overview

Server Components

Agent Execution Model

Container Structure

Runtime Wiring (self-hosted)

Control Loop

Request Flow

Chat Run Lifecycle

Streaming and Buffering

Chat Run Sequence

Why the Control Loop is Server-Side

Request Routing

Communication

Server -> Agent

Agent -> Server

Data and Storage

Limitations