Building a Remote MCP Server With OAuth: What the Spec Doesn't Tell You [2026]

TL;DR

A remote MCP server (streamable HTTP) means zero install for users, a real auth story, and one always-current deployment — in exchange for owning sessions, timeouts, and scaling yourself
“Add OAuth” actually means: two discovery endpoints (RFC 9728 + RFC 8414), dynamic client registration, PKCE, a consent screen, token validation on every request, refresh rotation, and revocation
Rate-limit your registration endpoint — it is anonymous by design
Migrating from API keys? Dispatch on the token prefix and append a deprecation notice to tool responses instead of breaking existing connections
Tune your request timeout to your slowest tool (ours is 130 seconds because one AI-backed tool can take 120), and run blocking tools in supervised tasks so they can never stall the server

When we shipped Hooklistener's MCP server, the protocol part took days. The parts nobody warns you about — the OAuth stack the spec expects, keeping legacy API keys working during migration, a request timeout that one slow tool kept blowing through — took weeks. This is the write-up we wish we had read first. Everything below is grounded in the code that runs app.hooklistener.com/api/mcp in production today: a Phoenix/Elixir service exposing 46 tools to Claude Code, Cursor, and any other MCP client.

Why Remote (Streamable HTTP) Beats stdio for a SaaS

Most MCP tutorials show you a stdio server: a local process the client spawns and talks to over stdin/stdout. That model is great for tools that wrap local resources — a filesystem, a local database. For a SaaS, it is the wrong default for three reasons:

No install step. With a remote server, connecting is one command or one config entry pointing at a URL. There is no npm package to ship, no Node version matrix to support, no “works on my machine” tickets.
A real auth story. A stdio server inherits whatever credentials live on the user's machine. A remote server can run a proper OAuth flow: the user signs in through a browser, picks an organization, and gets short-lived scoped tokens that you can revoke server-side.
Always current. When we ship a new tool, every connected client sees it on the next tools/list. There is no stale-version long tail.

What you take on in exchange is everything a long-running HTTP service implies. Our transport is streamable HTTP: JSON-RPC 2.0 over POST to a single endpoint, an optional GET with Accept: text/event-streamthat opens an SSE stream (we send keepalive comments every 15 seconds so proxies don't kill idle connections), and DELETE to close a session. Session state is tracked via an mcp-session-id header; on our side each session is a supervised Elixir process started on demand, so a crashed session takes down exactly one conversation, not the server.

One honest caveat about horizontal scaling: per-session processes mean session affinity matters. A follow-up request that lands on a node without that session's process needs the session to be re-established. The BEAM gives us cheap supervised processes and a registry, which makes the single-node story excellent — but if you are building on a stateless runtime, plan where session state lives before you ship, not after.

The OAuth Stack a Remote MCP Server Actually Needs

The MCP authorization spec has been revised more than once since remote servers became a thing, so we won't pin claims to a spec version here — check the current spec before you build. But the shape that clients like Claude Code expect in practice is stable: OAuth 2.0 with discovery metadata, dynamic client registration, and PKCE. Here is every endpoint we ended up shipping:

GET  /.well-known/oauth-protected-resource     RFC 9728 resource metadata
GET  /.well-known/oauth-authorization-server   RFC 8414 server metadata
POST /oauth/register                           Dynamic client registration (rate limited)
GET  /oauth/authorize                          Consent screen (browser session)
POST /oauth/authorize                          Approve / deny decision
POST /oauth/token                              Code exchange + refresh
POST /oauth/revoke                             Token revocation
POST /api/mcp                                  The MCP endpoint itself

That is seven OAuth-related routes for one MCP endpoint. None of them are optional if you want unattended client onboarding. Let's walk through what each one took.

Discovery: how clients find your auth server

The flow starts with a failure. When a client hits /api/mcp without credentials, we return 401 with a WWW-Authenticate header pointing at the protected resource metadata:

HTTP/1.1 401 Unauthorized
WWW-Authenticate: Bearer resource_metadata="https://app.hooklistener.com/.well-known/oauth-protected-resource"

{
  "error": "unauthorized",
  "error_description": "Authentication required. Use OAuth 2.0 or provide an API key."
}

The client follows that URL to the RFC 9728 document, which identifies the resource and points at the authorization server:

{
  "resource": "https://app.hooklistener.com/api/mcp",
  "authorization_servers": ["https://app.hooklistener.com"],
  "scopes_supported": ["full_access"],
  "bearer_methods_supported": ["header"]
}

From there it fetches the RFC 8414 authorization server metadata, which advertises every capability the client needs to self-configure:

{
  "issuer": "https://app.hooklistener.com",
  "authorization_endpoint": "https://app.hooklistener.com/oauth/authorize",
  "token_endpoint": "https://app.hooklistener.com/oauth/token",
  "registration_endpoint": "https://app.hooklistener.com/oauth/register",
  "revocation_endpoint": "https://app.hooklistener.com/oauth/revoke",
  "response_types_supported": ["code"],
  "grant_types_supported": ["authorization_code", "refresh_token"],
  "code_challenge_methods_supported": ["S256"],
  "token_endpoint_auth_methods_supported": ["none", "client_secret_post"],
  "scopes_supported": ["full_access"],
  "client_id_metadata_document_supported": true
}

Both controllers are trivial — ours fit in 35 lines combined. The work is in deciding what to advertise, because everything you list here is a promise the rest of the stack has to keep.

Dynamic client registration — and why you must rate-limit it

MCP clients are not pre-registered apps. Your server has never heard of a given Claude Code or Cursor install until it shows up, so it needs to register itself: a POST to /oauth/register with a client name and redirect URIs, returning a generated client_id. Public clients (auth method none) get no secret; confidential clients get a secret we store only as a hash.

Here is the part that is easy to miss: this endpoint is anonymous and writes to your database. Anyone on the internet can create rows in your clients table. We put a dedicated rate-limit plug in front of it — in the router it is the only OAuth route with its own pipeline:

scope "/oauth", HooklistenerServiceWeb.OAuth do
  pipe_through [:api, HooklistenerServiceWeb.Plugs.OAuthRegistrationRateLimit]

  post "/register", RegistrationController, :create
end

We also support URL-based client IDs (the client_id_metadata_document_supported flag above): if the client_id is an HTTPS URL with a path, we fetch the metadata document it points to and auto-register the client from that. It is a newer pattern in the MCP ecosystem that avoids the registration round-trip entirely, and supporting both cost us little.

PKCE: required, not optional

MCP clients are public clients — a CLI on a laptop cannot keep a secret. PKCE is what makes the authorization-code flow safe for them, so we made it mandatory: the authorize endpoint rejects requests without a code_challenge, only S256 is accepted, and the token endpoint refuses a missing or empty verifier explicitly rather than treating absence as a pass:

defp validate_pkce(_auth_code, nil), do: {:error, :missing_code_verifier}
defp validate_pkce(_auth_code, ""), do: {:error, :missing_code_verifier}

defp validate_pkce(auth_code, code_verifier) do
  computed = Base.url_encode64(:crypto.hash(:sha256, code_verifier), padding: false)

  if Plug.Crypto.secure_compare(computed, auth_code.code_challenge),
    do: :ok,
    else: {:error, :invalid_code_verifier}
end

Those first two clauses matter. The subtle failure mode in PKCE implementations is a client that registers a challenge and then omits the verifier — if your check only runs when a verifier is present, you have silently disabled PKCE.

The consent screen — with organization selection

/oauth/authorizeis the only browser-facing piece. If the user is not signed in, we stash the OAuth params in the session and bounce through our normal login, then return. The consent screen shows which client is asking and — the part generic OAuth guides skip — asks the user to pick an organization. Hooklistener users can belong to several orgs, and an MCP token has to be scoped to exactly one, because every tool call is answered in an organization's context. The selected org ID is baked into the authorization code and flows into every token minted from it.

Two implementation details that saved us pain: pending authorization requests live server-side in the session with a 10-minute TTL and are consumed on first use, so the approve POST cannot be replayed with tampered parameters. And redirect URI matching ignores the port for http://localhost and http://127.0.0.1 callbacks — per RFC 8252, native clients bind whatever loopback port is free, and exact-match validation would break every CLI client on the second run.

Tokens: validate on every request, store only hashes

Our token model is deliberately boring. Access tokens live 1 hour, refresh tokens 30 days, authorization codes 10 minutes. Tokens are random strings stored as a lookup prefix plus a hash, never in plaintext — validation finds candidates by prefix, verifies the hash, then checks revoked_at and expires_at. Every single MCP request re-validates the token and loads the user and organization. No caching of auth decisions in session state.

Refresh is rotation, not renewal: exchanging a refresh token revokes it and all access tokens linked to it, then mints a new pair. And /oauth/revokereturns 200 even for tokens it has never seen — per RFC 7009, revocation should not be an oracle for whether a token exists.

Running OAuth and Legacy API Keys Side by Side

Our MCP server launched with API-key auth: paste a hklst_ key into an Authorization header and go. When we added OAuth we could not break those connections, so the server now dispatches on the shape of the credential. Hooklistener API keys have a recognizable prefix, which makes the dispatch a pattern match:

defp authenticate_from_headers(frame) do
  case frame.context.headers["authorization"] do
    "Bearer hklst_" <> _ = bearer ->
      "Bearer " <> key = bearer
      authenticate_with_api_key(key, frame)

    "Bearer " <> token ->
      authenticate_with_oauth_token(token, frame)

    _ ->
      case frame.context.headers["x-api-key"] do
        nil -> {:error, authentication_required_error(), frame}
        key -> authenticate_with_api_key(key, frame)
      end
  end
end

A bearer token starting with hklst_ is an API key; any other bearer token is validated as an OAuth access token; a bare X-API-Keyheader still works too. If your API keys are not prefix-distinguishable from your OAuth tokens, fix that before you attempt a dual-auth migration — it is the one property that makes this dispatch trivial.

Then comes the soft-migration nudge. When a request authenticated via API key, we flag it, and the JSON reply helper appends a deprecation notice to the tool response itself:

@deprecation_notice "Note: API key authentication for MCP is deprecated. Please reconnect using OAuth. See https://hooklistener.com/docs/mcp-oauth"

This turned out to be the most effective deprecation channel we have. The notice rides along in the tool result, so the AI assistant reads it and frequently relays it to the user unprompted — “by the way, your Hooklistener connection uses a deprecated auth method”. No email campaign required. OAuth-authenticated requests never see it.

Operational Lessons From Production

Your request timeout is set by your slowest tool

Web framework defaults assume requests finish in seconds. MCP tool calls do not. Our compare_requeststool sends captured webhooks to an AI provider for analysis, and that call can take up to 120 seconds. The transport timeout has to clear it with margin, and the reason deserves a comment in the router, because someone will eventually try to “fix” that number:

scope "/api" do
  forward "/mcp", HooklistenerServiceWeb.MCP.StreamableHTTPPlug,
    server: HooklistenerServiceWeb.MCP.Server,
    # compare_requests can wait on the AI provider for up to 120s.
    request_timeout: 130_000
end

Also check the rest of the chain: load balancer idle timeouts and reverse-proxy read timeouts will cut a 120-second response off at 60 if you forget them.

Blocking tools must not block the server

Our most agent-friendly tool is wait_for_request: an assistant triggers a webhook, then calls this tool to block until the request actually arrives on an endpoint. Blocking is the feature — but it cannot be allowed to block the session process that handles all of that client's requests. So the wait runs in a supervised Taskthat subscribes to the endpoint's PubSub topic and sits in a receive loop. The timeout is clamped to 0–60 seconds (default 30), the await gets a 5-second grace on top, and a crashed task is reported as a timeout rather than an error. The same pattern backs wait_for_email. If your runtime does not have cheap supervised concurrency, blocking tools are where a remote MCP server gets genuinely hard.

Gate plans at the tool layer, not the transport layer

It is tempting to reject the whole connection if the user's plan does not cover every feature. Don't. Free-plan users can connect to our server and use most of the 46 tools; the gated ones check entitlements inside the tool and return a plain-language explanation — create_monitorcan answer “Uptime monitors are not available on your plan”, create_endpoint can answer “Debug endpoint limit reached for your plan. Upgrade to create more.” The assistant reads that string and tells the user exactly what happened and what to do about it. A 403 at the transport layer would just read as “server broken”.

46 Tools Is a Lot: Fighting Context Bloat

Every tool you expose is text in the model's context window on every conversation. Our server has grown to 46 tools across nine categories — endpoints, captured requests, automations, schedules, secrets, datastore, uptime monitors, email inboxes, and AI analysis — which is enough that discipline stops being optional:

Boring, consistent names. list_endpoints, get_endpoint, create_endpoint, delete_endpoint. The model picks the right tool by name first; cleverness here costs you wrong tool calls.
Guided creators instead of one mega-tool. Endpoint automations take a type and a configobject whose schema varies wildly by type. One generic tool meant models guessing at config shapes. We split it into per-type tools — create_http_request_action, create_condition_action, create_extract_json_action, create_run_script_actionand friends — each with precise parameters and good defaults, keeping the raw create_endpoint_action only as a fallback for advanced types. Seven extra tools, far fewer malformed calls.
Respect client tool budgets. Some MCP clients cap or warn about the total number of enabled tools across all connected servers, and users stack several servers at once. Every tool has to earn its slot; we cut anything a model would not plausibly reach for.

Descriptions matter as much as names. Ours state what the tool does, its defaults, and its constraints in one or two sentences — written for a model deciding between 46 options, not for a human browsing docs.

Should You Build One?

Honestly: most teams should not. If your data already lives in a product with a good MCP server, use that server. The OAuth stack above is real work, it is security-sensitive work, and it is undifferentiated — nobody chooses a product for its token endpoint.

Building pays when you own both the data and the workflows around it. For us, the MCP server is not a wrapper over a REST API — tools like wait_for_request and diagnose_requestexist because an AI agent debugging a webhook integration needs primitives a human-facing API never needed: block until the event arrives, then explain what went wrong. That is the test. If your MCP server would just mirror your existing endpoints, an off-the-shelf OpenAPI-to-MCP bridge gets you 90% of the value for 5% of the effort. If agents need workflows your API does not have, build — and budget more time for auth than for tools.

If you want to see the result of all this from the user side, the Hooklistener MCP server connects with OAuth in about a minute, and our setup guide covers Claude Code, Cursor, Windsurf, and Codex CLI. New to the protocol entirely? Start with what an MCP server is.

Building a Remote MCP Server With OAuth: What the Spec Doesn't Tell You