> ## Documentation Index
> Fetch the complete documentation index at: https://docs.blink.so/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Architecture

> Technical architecture of the Blink Server and how it orchestrates AI agents.

## Overview

The Blink Server acts as the control plane for AI agent deployments. Key architectural principles:

* Agents are HTTP servers deployed as Docker containers
* The control loop runs inside the server, not in agents
* Communication is HTTP; chat streaming uses SSE and WebSocket to clients
* State is centralized in PostgreSQL (chats, runs, deployments, files, logs, traces, KV)

The server also hosts the web UI and the HTTP API used by the CLI and SDK.

## Server Components

* API server handles chats, agents, webhooks, files, logs, traces, and devhook routing
* WebSocket server is used for chat streaming and auth token handshakes
* Startup runs database migrations before accepting traffic

## Agent Execution Model

Agents are deployed as Docker containers using a configurable image (default: `ghcr.io/coder/blink-agent:latest`).

### Container Structure

Each agent container includes:

| Component               | Purpose                                                                                      |
| ----------------------- | -------------------------------------------------------------------------------------------- |
| Agent bundle            | Built files staged into `/app` from the deployment output                                    |
| Runtime wrapper         | Starts the agent and internal API server, proxies requests, injects auth                     |
| Internal API server     | Serves `/kv`, `/chat`, and `/otlp/v1/traces` for agent code and forwards to the Blink Server |
| OpenTelemetry Collector | Collects agent logs and forwards them to the server                                          |

### Runtime Wiring (self-hosted)

On deployment the server:

1. Downloads deployment output files, writes them to a temp dir, and adds a runtime wrapper (`__wrapper.js`).
2. Launches a container and sets environment variables like `ENTRYPOINT`, `PORT`, `INTERNAL_BLINK_API_SERVER_URL`, `INTERNAL_BLINK_API_SERVER_LISTEN_PORT`, `BLINK_REQUEST_URL`, `BLINK_REQUEST_ID`, and `BLINK_DEPLOYMENT_TOKEN`.
3. The wrapper starts an internal API server inside the container and patches `fetch` so internal API calls include `x-blink-internal-auth`.
4. The wrapper runs the agent entrypoint on `PORT+1` and proxies incoming requests on `PORT` to the agent.
5. The OpenTelemetry collector starts and reads the agent log pipe.

## Control Loop

The control loop is the core orchestration mechanism. It runs inside the server, not in agents.

### Request Flow

1. External event arrives (API call, Slack message, GitHub webhook)
2. Server routes the event to the appropriate agent deployment
3. Server invokes the agent's `/_agent/chat` endpoint with an invocation token
4. Agent processes the request and streams a response back (SSE)
5. Server persists messages and run/step state to PostgreSQL and fans out to clients

### Chat Run Lifecycle

* Each chat run has one or more steps stored in the DB.
* The server selects the latest step, invokes the active deployment, and streams chunks as they arrive.
* If the response includes tool calls, the server creates a new step and continues the loop.
* Interrupts cancel an in-flight step and restart with the latest state.

### Streaming and Buffering

* The server broadcasts `message.chunk.added` events to WebSocket and SSE clients.
* The current streaming buffer is kept in memory to allow reconnects.
* This in-memory session state is the main blocker for horizontal scaling today.

### Chat Run Sequence

```mermaid theme={null}
sequenceDiagram
  participant Client as Web UI or SDK
  participant Server as Blink Server
  participant DB as Postgres
  participant Agent as Agent Container

  Client->>Server: POST /api/chats/{id}/sendMessages
  Server->>DB: create run + step, store messages
  Server->>Agent: POST /_agent/chat (x-blink-invocation-token)
  Agent-->>Server: SSE stream (message chunks)
  Server->>DB: write response message + update step
  Server-->>Client: stream chunks over WS or SSE
```

### Why the Control Loop is Server-Side

Running the control loop in the server rather than agents provides:

* Centralized state in PostgreSQL
* Agent simplicity (no orchestration logic)
* Observability and auditability
* Consistent tool-call looping behavior

For more details about the control loop, see the [agent structure](/docs/essentials/agent-structure) guide.

## Request Routing

For details on webhook routing and devhooks, see the [webhooks and devhooks](/docs/server/webhooks) guide.

## Communication

### Server -> Agent

The server communicates with agents via HTTP:

| Endpoint               | Method | Purpose                      |
| ---------------------- | ------ | ---------------------------- |
| `/_agent/health`       | GET    | Health check                 |
| `/_agent/chat`         | POST   | Chat request, SSE response   |
| `/_agent/capabilities` | GET    | Check supported handlers     |
| `/_agent/ui`           | GET    | UI schema for dynamic inputs |
| `/_agent/flush-otel`   | POST   | Flush telemetry buffers      |
| `/_agent/*`            | ANY    | Custom request handler       |

Older deployments may still be called via `/sendMessages` or `/_agent/send-messages`.

All server -> agent calls include `x-blink-invocation-token`. Chat runs also include run, step, and chat ID headers.

### Agent -> Server

Agents do not call the public API directly in containers. Instead, the wrapper exposes an internal API server:

* `/kv` for agent key-value storage
* `/chat` for chat CRUD and message operations
* `/otlp/v1/traces` for trace export (logs are forwarded by the collector)

The wrapper forwards these to the Blink Server using the invocation token and the deployment token.

## Data and Storage

PostgreSQL stores:

* chat messages, runs, and steps
* agents, deployments, and deployment targets
* files and attachments
* logs and traces (self-hosted)

Migrations run automatically at server startup.

## Limitations

Current architectural constraints to be aware of:

| Limitation              | Details                                                         |
| ----------------------- | --------------------------------------------------------------- |
| **Single node only**    | In-memory chat streaming buffers prevent horizontal scaling     |
| **Docker required**     | Agents must run as Docker containers (no Kubernetes, ECS, etc.) |
| **Local Docker daemon** | Server must have direct access to Docker socket                 |

These limitations exist because Blink is in early access. We plan to support horizontal scaling and other deployment options in the future.