Nick Hoff

How to Architect Backend-to-Frontend Messaging for Second Brain

September 17, 2025

I'm building Second Brain, a voice AI assistant composed of multiple agents: the conversational agent you talk to, a backend "brain", and separate document and email tools. The flow is simple in prose: the user speaks, the backend "brain" processes the request and calls tools and databases. Responses can arrive as several messages: intermediate status updates, a final answer, and occasionally unsolicited notifications triggered by backend events ("That email just arrived").

Vercel serverless functions feel like the natural place to put that brain: deploy it and call it. But as you try this in the real world, three problems emerge immediately:

  1. Intermediate messages ("I'm searching your email for that PDF, hang on") need to get to the client while the brain continues to run.
  2. The brain can run longer than a single function invocation before finishing. If the function gets close to a timeout, how do you continue the work and still keep the client informed?
  3. A backend-initiated event occurs ("Hey - you just received that email you've been waiting for.") but there's no open HTTP connection available to send it to the user.

In this post I'll walk through the requirements, the tradeoffs, and a few practical architectures. I focus on architecture decisions: what should run where (Vercel, a realtime layer, queues, state stores) and how messages should be passed between the frontend, the backend brain, and realtime components.

Why this is harder than it sounds

Before we dive into solutions, it's helpful to be explicit about what the "brain" needs:

If you sketch that out, you quickly see the mismatch with pure serverless HTTP functions: they are request→response. No persistent connection, no background processes.

Option 1 — Server-Sent Events (SSE)

SSE (Server-Sent Events) is a uni-directional, server→client streaming mechanism over HTTP. The client opens an EventSource, the server responds with Content-Type: text/event-stream and keeps the HTTP response open, sending framed events as lines like:

event: progress
id: 1
data: checking email

The browser's EventSource API handles automatic reconnection and delivers those text events to your client. SSE is great for text-based server push, lightweight, and compatible with normal HTTP stacks (proxies, CDNs) that allow long-lived responses.

Important specifics for this post:

Pros:

Cons:

If you have a handful of users and intermittent messages, SSE is perfectly fine. At scale, it's expensive and brittle unless you put the SSE connection onto a stateful service (containers, managed realtime providers).

// client.js
const es = new EventSource('/api/assistant/stream?session=abc');
es.onmessage = (e) => console.log('msg', e.data);
es.onerror = (err) => console.error('sse error', err);

Server-side you must format lines according to the text/event-stream spec and keep the HTTP response open. Vercel's serverless functions aren't designed for long-lived connections, so for meaningful SSE you'll typically need either a streaming-capable edge runtime or to host the SSE endpoint on a service designed for long-lived connections.

Option 2 — A dedicated server that terminates a websocket

Run a persistent WebSocket endpoint (e.g. AWS API Gateway WebSocket, a fleet of containers behind an ALB/NLB, or a managed provider). The frontend holds a socket, backend pushes to those sockets. For the purposes of this option, treat it as a pure websocket solution: the frontend connects to a long-running websocket endpoint that the backend uses for all server→client realtime messages.

Pros:

Cons:

This option keeps all realtime delivery on the websocket layer; Vercel can still host short-lived business logic, but any server-originated push goes through the websocket tier.

Option 3 — Streaming HTTP responses from Vercel functions

Modern platforms (including Vercel Edge Functions and many serverless runtimes) can stream responses via ReadableStream or chunked Transfer-Encoding. That solves intermediate messages for a single request: you can flush multiple partial responses over one HTTP call.

Pros:

Cons:

// api/assistant/stream.js - Vercel Edge
export default async function handler(req) {
  const { readable, writable } = new TransformStream();
  const writer = writable.getWriter();
  writer.write(encode('event: init\ndata: checking email\n\n'));
  processInBackground(async (progress) => {
    writer.write(encode(`event: progress\ndata: ${progress}\n\n`));
  });
  writer.write(encode('event: done\ndata: all good\n\n'));
  writer.close();
  return new Response(readable, { headers: { 'Content-Type': 'text/event-stream' } });
}

That’s nice for one-off interactions but not a silver bullet.

What I actually implemented

The implementation keeps almost all brain logic in Vercel serverless functions while using a websocket as the realtime delivery channel for messages that cannot be handled synchronously.

Here's the overall setup:

high-level architecture diagram

Key elements of the implementation

  1. Front-to-back requests from the voice agent to the brain are standard HTTP requests. The frontend calls a secured route such as /api/voiceToBrain. That route validates and authorizes the request and then invokes the brain entrypoint brainGo().

  2. The /api/voiceToBrain route validates the request body (zod or similar) and calls brainGo(). brainGo() returns either a websocket message object or null. If it returns a websocket message object and the request is in a mode that allows an immediate HTTP response, /api/voiceToBrain returns that message in the HTTP response. If brainGo() returns null, /api/voiceToBrain returns 200 and the voice agent expects the reply via the websocket.

  3. The frontend tool (askBrain) handles both outcomes: if it receives an immediate message in the HTTP response it treats that as the result; if it receives null it returns a short explanatory message like "the brain is working, the response will arrive later" as the result of the askBrain call so the voice agent knows that the call was successful but the result will arrive later. Note that this message is not spoken to the user, it is only given to the voice agent.

  4. brainGo is the central server-side function that implements the brain logic. It accepts userId and recentContext and returns a Promise<Message | null>. Behavior:

    • Attempt to acquire the per-user lock. If lock acquisition fails, return null immediately. This indicates that a brain is already running for that user. The users message will be recorded in the database though, so it will be processed when the current brain invocation is done.
    • Call doOneBrainLoop(recentContext). doOneBrainLoop returns { needsAnotherLoop: boolean, websocketMessage?: Message }.
    • If needsAnotherLoop is false and there is a websocketMessage, brainGo returns that message directly (this becomes the HTTP response when invoked via /api/voiceToBrain).
    • If needsAnotherLoop is true and there's a message, brainGo sends the message over the websocket (sendWebsocketMessage) and then continues into the next loop.
    • If needsAnotherLoop is false and there's no message, brainGo returns null.
    • If needsAnotherLoop is true and there's no message, brainGo proceeds to the next loop.
    • The value of needsAnotherLoop is primarily determined by whether the LLM calls the sayToUserAndContinue tool or the sayToUserAndQuit tool.
    • If brainGo detects it will run out of execution time and must continue later, it persists state and triggers the reinvocation path (brainToBrain) and then returns null.

mid-level architecture diagram

  1. doOneBrainLoop accepts recentContext as an optional argument so the first loop can include the most recent client-side message (which may not yet be persisted). It fetches persisted messages and tool results as needed and composes the full context for LLM/tool calls. It returns whether another loop is required and an optional websocketMessage to deliver now.

  2. Back-to-front messages have two delivery paths in the current architecture:

    • Direct HTTP response: if brainGo is invoked directly by /api/voiceToBrain and brainGo determines it is finished and wants to send a final message, it returns that message via the HTTP response. This is the fastest path for immediate answers.
    • Websocket: for intermediate updates, background events, or when the brain reinvokes itself via brainToBrain, messages are sent via the websocket. Breadcrumbs and backend event notifications always go down the websocket.
  3. The voice interface has handlers for both delivery methods: it can accept an immediate HTTP response from askBrain or listen to the websocket for system messages and breadcrumbs. The UI tracks websocket connection status and presence so it can surface delays or reconnections.

  4. Tools that send messages to the user include a doneAfterThis flag. At low levels, tools indicate whether the message should be terminal (suitable for the HTTP path) or not (must be delivered via the websocket). This flag propagates up through doOneBrainLoop to brainGo so delivery decisions are deterministic.

  5. /api/brainToBrain is the reinvocation path: when the brain will exceed the current invocation it persists state and enqueues or reinvokes a Vercel function to continue processing (this is the reinvokeOnTimeout() function). In this reinvocation mode, returning messages via HTTP is not allowed — all messages must use the websocket.

  6. There is a websocket→brain HTTP route (kept for compatibility) which validates schema and auth but is effectively a no-op for client→brain messages in normal operation because front-to-back traffic flows over HTTP. This allows other messaging setups in the future, like sending user messages to the brain via the websocket.

Why this architecture is good

Why this architecture is bad

When the decision would be different

Conclusion

What I would do next if I had more time

If the product matures and the realtime load justifies it, I’d move to a globally distributed connection tier (containers + edge routing) and a small control-plane service that assigns brains to connections, persists state, and deals with reconnections. I’d keep the Vercel brain as a migration path and only move latency-critical parts out of it.

Or - hope that Vercel builds a websocket service and sprinkles their magic dust on it. I'd be user #1.

The Moral of the Story

The serverless model is amazing for a huge class of problems, but it’s intentionally constrained. When your app needs durable connections and background-initiated messages, accept that you’ll need a small amount of stateful infrastructure.

You can buy performance at the cost of complexity by moving more functionality into the stateful compute.