Skip to content

Implementing End-User OAuth Identity Passthrough for Remote MCP Servers

Engineer's guide to OAuth identity passthrough for MCP servers: token refresh patterns, revocation on disconnect, audit trails for delegation chains, and incident response runbooks.

Uday Gajavalli Uday Gajavalli · · 24 min read
Implementing End-User OAuth Identity Passthrough for Remote MCP Servers

If you are about to expose your B2B SaaS platform to AI agents over the Model Context Protocol (MCP), the single most important decision you will make is how the agent inherits the end-user's identity. Get it wrong, and you ship a shared service account masquerading as personalization. Get it right, and the agent operates inside the exact same permissions, scopes, and audit trail your human user already has.

Orchestrating an AI agent on a local machine is a solved problem. You define a persona, hand it a few Python functions, paste a vendor API key into your environment variables, and watch the agent reason through tasks. But deploying that same agent into a production B2B SaaS environment exposes a massive architectural gap. When your AI agent needs to act on behalf of your users inside external systems—reading Jira tickets, updating Salesforce opportunities, or pulling BambooHR employee records—you suddenly have to manage multi-tenant OAuth 2.0 lifecycles and ensure strict isolation between what different agents are allowed to access, a challenge we explored in our guide to architecting a multi-tenant MCP server.

End-user OAuth identity passthrough for remote MCP servers means the access token an AI agent presents to your MCP server is bound to a specific human user, scoped to their permissions in the downstream SaaS, and validated by your authorization server as the resource owner—never a static API key shared across tenants. That sentence is the whole architectural battle.

This guide breaks down the architectural patterns required to securely expose your SaaS platform to AI agents. We will cover the mechanics of OAuth 2.1 identity passthrough, handling the brutal realities of token lifecycles, managing third-party rate limits, and why dynamic tool generation is the only way to scale MCP servers in production. This is written for senior PMs and engineering leads who have already shipped OAuth integrations and are now being asked to make their product "AI-agent ready" without becoming the next confused-deputy CVE.

The AI Agent Identity Crisis in B2B SaaS

The scale of the problem is no longer theoretical. A recent Cloud Security Alliance survey found that 82% of enterprises have unknown AI agents operating in their environments, and the same research reports 65% have experienced AI agent incidents in the past year, ranging from silent data exfiltration to operational outages. Microsoft's Cyber Pulse data adds that roughly 80% of Fortune 500 companies are already running active AI agents in production workflows.

Most early multi-agent deployments rely on service accounts and static API keys. The developer hardcodes a credential, and the agent uses it to authenticate against an external system. In a single-tenant environment, this is risky. In a multi-tenant B2B SaaS environment, it is a catastrophic vulnerability waiting to happen.

Currently, most agents are authenticating with one of three patterns, and all three are broken at enterprise scale:

  1. A shared service-account API key copied into a .env file or secrets manager.
  2. A long-lived OAuth refresh token issued to a single internal user, reused across every customer.
  3. A bearer token minted by the SaaS itself with no upstream tie to the end-user's identity.

Each of these collapses the principle of least privilege. When an AI agent uses a shared API key, it inherits the aggregate privileges of that service account. If the agent is compromised—via prompt injection or a hallucinated reasoning loop—it has unbounded access to the entire connected application. The blast radius is massive. There is no way to revoke access for one user without breaking everyone.

Enterprise buyers know this. They are no longer treating AI agent security as a future-state roadmap item. Access tokens must be scoped and rotated in ephemeral, cloud-native environments, and agents blur the audit trail by blending delegated user authority with autonomous action. Identity passthrough—especially when combined with a zero data retention architecture for SOC 2 compliance—is the only pattern that survives a rigorous security review.

What is End-User OAuth Identity Passthrough?

End-user OAuth identity passthrough is an architectural pattern where an AI agent does not authenticate to a remote system using its own service credentials. Instead, the agent authenticates as the specific human user who invoked it, inheriting their exact permissions, scopes, and role-based access controls (RBAC) in the target SaaS application.

If User A asks an agent to summarize their open Jira tickets, the agent connects to the remote MCP server using an OAuth token explicitly tied to User A's Jira identity. The MCP server validates the token, extracts the user identity, and proxies the request to Jira using User A's specific credentials. If the agent hallucinates and tries to read User B's tickets, the target SaaS API rejects the request because User A's token lacks the necessary permissions.

Identity passthrough is a chain of three OAuth relationships that share a single subject claim: the human user.

sequenceDiagram
    participant U as End User
    participant A as AI Agent / MCP Client
    participant M as Your MCP Server<br>(Resource Server)
    participant AS as Authorization Server
    participant T as Third-Party SaaS<br>(Salesforce, Jira, etc.)

    A->>M: tools/call (no token)
    M-->>A: 401 + WWW-Authenticate<br>resource_metadata=...
    A->>AS: Authorization Code + PKCE (user consents)
    AS-->>A: Access token (aud=MCP, sub=user)
    A->>M: tools/call (Bearer token)
    M->>M: Validate aud, sub, scope & extract identity
    M->>T: API call with user's<br>downstream OAuth token
    T-->>M: Data scoped to user's permissions
    M-->>A: Tool response

The agent never sees the downstream SaaS token. Your MCP server never accepts a token minted for some other audience. The end-user explicitly consents to the scopes the agent will use. If the user is fired in your HRIS, deprovisioning their identity in the IdP cascades through every agent acting on their behalf, eliminating privilege creep.

Warning

A common mistake is to take the access token the MCP client presents and forward it directly to the downstream API. The MCP spec explicitly forbids this. Never pass through the token received from the MCP client - this creates confused deputy vulnerabilities where downstream services may incorrectly trust tokens not intended for them. The June 2025 spec explicitly prohibits MCP servers from passing through tokens to upstream APIs. Mint or look up a separate, audience-scoped token for each downstream call.

The Three Token Relationships

The sequence diagram above shows the request flow. The security architecture underneath relies on three distinct tokens with separate audiences that never cross boundaries:

flowchart TD
    U["👤 End User"] -->|"1. Authenticates +<br>consents to scopes"| AS["Authorization Server<br>your IdP"]
    AS -->|"2. Issues JWT<br>sub=user_id<br>aud=mcp.yoursaas.com"| Agent["AI Agent /<br>MCP Client"]
    Agent -->|"3. Bearer token in<br>Authorization header"| MCP["MCP Server<br>Resource Server"]
    MCP -->|"4. Validates JWT: aud, sub,<br>scope, expiry"| MCP
    MCP -->|"5. Looks up stored<br>downstream credentials<br>for sub=user_id"| Store["Token Store<br>encrypted"]
    Store -->|"6. Returns user's<br>provider-specific<br>OAuth token"| MCP
    MCP -->|"7. API call with user's<br>downstream token"| SaaS["Third-Party SaaS<br>Salesforce, Jira, etc."]

Three tokens, three audiences, one human subject:

  • Token A (MCP Access Token): Issued by your authorization server with aud=https://mcp.yoursaas.com and sub=user_id. This is what the AI agent presents to the MCP server. It proves the user authorized this specific agent to call your tools.
  • Token B (Downstream SaaS Token): A provider-specific OAuth access token issued during the original account connection flow. Stored encrypted in your database, keyed by user identity. The MCP server retrieves this based on the sub claim extracted from Token A.
  • Token C (Downstream Refresh Token): The long-lived token that keeps Token B alive. Never leaves your server. Proactively rotated before expiry.

The MCP server is the bridge between these two token universes. It validates Token A (checking audience, signature, expiry, and scopes), extracts the user identity, then retrieves Token B to make the downstream API call. At no point does the AI agent see or handle Token B or Token C. The sound default is to prefer short-lived, narrowly scoped tokens over long-lived broad ones - a leaked token with a five-minute lifetime and read-only scope is a contained problem.

Implementing End-User OAuth Identity Passthrough for Remote MCP Servers

The Model Context Protocol's authorization specification is now explicit about what compliant remote servers must do to secure AI agent access. The November 2025 revision settled the architecture: MCP auth implementations MUST implement OAuth 2.1 with appropriate security measures for both confidential and public clients. Building this requires a strict separation between the MCP client (the agent framework) and the remote MCP server (the integration middleware).

There are six moving parts you need to implement to do this correctly:

1. Treat your MCP server as a resource server, not an authorization server

MCP servers act as OAuth 2.1 resource servers only, validating tokens issued by external, dedicated authorization servers. The MCP server's job is to validate tokens and enforce scopes internally, but not to manage user logins or token issuance. If you already run an IdP (WorkOS, Auth0, Okta, your own Keycloak), point your MCP server at it. Do not build a parallel authentication stack inside the protocol handler.

2. Publish Protected Resource Metadata (RFC 9728)

When an MCP client hits a protected tool without a token, you must respond with a 401 and a WWW-Authenticate header that points to a discovery document:

HTTP/1.1 401 Unauthorized
WWW-Authenticate: Bearer realm="mcp",
  resource_metadata="https://api.yoursaas.com/.well-known/oauth-protected-resource"

That document declares your authorization server, supported scopes, and bearer methods. MCP servers MUST implement OAuth 2.0 Protected Resource Metadata (RFC9728), and MCP clients MUST use it for authorization server discovery. Skip this, and Claude, ChatGPT, and Cursor's connector UIs literally cannot bootstrap your server.

3. Mandate PKCE on every authorization code exchange

OAuth 2.1 mandates PKCE for all clients using the Authorization Code flow. By adopting the 2.1 stack, MCP inherits this PKCE-by-default security posture, transforming PKCE from a patch into a foundational security layer. There are no exceptions for "trusted" agents. The MCP client (Claude, ChatGPT, or an internal LangGraph runner in a multi-agent framework) generates a code_verifier, hashes it into a code_challenge, and the authorization server only redeems the code if the verifier matches.

4. Validate the audience claim and extract context

The single most common production exploit in MCP deployments is audience confusion: a server accepts a token minted for a sibling service. An attacker can compromise an MCP server if it accepts tokens issued for other resources. When an MCP server doesn't verify that tokens were specifically intended for it via the audience claim, it may accept tokens originally issued for other services.

When the MCP client sends a JSON-RPC request to the remote MCP server, it includes the user's access token in the Authorization: Bearer header. The remote MCP server must intercept this request before parsing the JSON-RPC payload. The middleware validates the JWT signature, checks the expiration, verifies the audience claim, and extracts the user identity (e.g., the sub claim). It then uses this identity to look up the corresponding third-party SaaS credentials (the integrated account) stored securely in your database.

A minimal middleware check in any HTTP server framework:

async function validateMcpToken(req, res, next) {
  const token = extractBearer(req.headers.authorization)
  const claims = await jwt.verify(token, jwks, {
    issuer: 'https://auth.yoursaas.com',
    audience: 'https://api.yoursaas.com/mcp', // exact resource URI is mandatory
  })
  if (!claims.sub) return res.status(401).end()
  
  // Look up SaaS credentials based on the exact user identity
  req.user = { id: claims.sub, scopes: claims.scope?.split(' ') ?? [] }
  next()
}

5. Decide what to do about Dynamic Client Registration (DCR)

In a production environment, you cannot always hardcode client credentials into every deployed agent. MCP clients and authorization servers SHOULD support the OAuth 2.0 Dynamic Client Registration Protocol (RFC 7591) to allow MCP clients to obtain OAuth client IDs without user interaction. Claude Desktop, in particular, requires Dynamic Client Registration support and does not yet support a way for users to specify a client ID or secret for OAuth-based remote servers.

DCR solves a real problem: without DCR, three MCP clients and five servers means potentially fifteen separate OAuth app registrations to manage. DCR collapses that - clients register once, dynamically, and the authorization server handles the rest. It also ensures that every agent instance has a unique cryptographic identity, making it possible to revoke access for a single compromised agent without impacting the rest of the fleet.

But it ships with sharp edges. Using DCR on a remote MCP server, you're effectively letting anyone in the world register as a client with your OAuth provider. There are no guardrails by default. Servers must accept registration requests from clients they've never seen before. Harden your /register endpoint with rate limits, redirect URI allowlists (no wildcards, ever), and ideally a software-statement signature so only verified MCP hosts can register.

6. Conditional API Token Authentication

Relying solely on a bearer token in the header can be risky if the MCP server URL is ever leaked in logs or configuration files. A defense-in-depth approach involves conditional API token authentication. The remote MCP server enforces a secondary check: the caller must provide both a valid MCP token (to identify the target integration) and a valid platform session cookie or API token (to prove they are an authenticated user of your SaaS application).

Architectural Challenges: Token Lifecycles and Rate Limits

Implementing the authentication handshake is only the first step. Operating a multi-tenant MCP server at scale introduces severe operational challenges around concurrency and vendor API quirks. Token lifecycle management and rate limits are what actually wake up your on-call engineer.

The Thundering Herd: OAuth Token Refresh

Access tokens expire, typically every 30 to 60 minutes. A human user refreshes a Salesforce access token through invisible background activity. But an AI agent making 200 tool calls in a five-minute reasoning loop will hit token expiry mid-conversation. If you wait until the token expires to refresh it, the API request will fail, throwing an error back to the agent and stalling its reasoning chain.

Worse, if an agent makes parallel tool calls (e.g., fetching 10 different Salesforce records concurrently), and the token is expired, you will hit a severe race condition. Ten concurrent threads will attempt to refresh the exact same OAuth token simultaneously. The provider will issue a new token to the first request and immediately revoke the old refresh token, causing the other nine requests to fail with an invalid_grant error, permanently breaking the connection.

To solve this, your infrastructure requires two mechanisms:

  1. Proactive Refresh: Schedule background tasks to refresh OAuth tokens 60 to 180 seconds before they actually expire, rather than on demand.
  2. Mutex Locks for Refresh Operations: When a token must be refreshed, the operation must be protected by a distributed lock. The first request acquires the lock and initiates the HTTP call to the provider. Any concurrent requests for the same integrated account must await the resolution of that single promise. Multiple tool calls for the same user must be serialized. However, Tenant A's refresh must never block Tenant B's—the mutex key is the integrated account ID.
flowchart LR
    A[Token stored<br>with expires_at] --> B{Time until<br>expiry < 60s?}
    B -- Yes --> C[Refresh now]
    B -- No --> D[Schedule refresh<br>60-180s before expiry]
    D --> E[Background worker<br>fires]
    E --> C
    C --> F{Refresh<br>succeeded?}
    F -- Yes --> G[Update token,<br>reschedule]
    F -- 4xx --> H[Mark needs_reauth,<br>fire webhook]
    F -- 5xx --> I[Retry with<br>exponential backoff]

For a deeper treatment of refresh failure modes, see our guide to handling OAuth token refresh failures in production.

Short-Lived Token and Automatic Refresh Patterns

The flowchart above describes the strategy. Here are the concrete code patterns that implement proactive refresh, mutex-protected rotation, and pre-call token validation.

Schedule proactive refresh after every token update:

async function scheduleTokenRefresh(account: IntegratedAccount) {
  const expiresAt = new Date(account.tokenExpiresAt)
  const now = new Date()
 
  // Randomize buffer between 60-180s to spread load across accounts
  const bufferSeconds = Math.floor(Math.random() * 120) + 60
  const refreshAt = new Date(expiresAt.getTime() - bufferSeconds * 1000)
 
  if (refreshAt > now) {
    // Normal case: schedule well before expiry
    await scheduler.schedule(refreshAt, 'refresh_token', {
      accountId: account.id,
    })
  } else if (expiresAt > now) {
    // Buffer window passed but token still valid - refresh immediately
    await scheduler.schedule(now, 'refresh_token', {
      accountId: account.id,
    })
  }
  // If token already expired, skip scheduling - next API call handles it
}

The randomized buffer (60 to 180 seconds) prevents a spike of refresh requests when many accounts were connected around the same time. Without randomization, accounts created in the same minute will all try to refresh simultaneously, creating the same thundering herd problem you are trying to avoid.

Mutex-protected refresh to prevent concurrent token invalidation:

const activeLocks = new Map<string, Promise<TokenResult>>()
 
async function refreshWithMutex(accountId: string): Promise<TokenResult> {
  // If a refresh is already in flight for this account, wait for it
  const inFlight = activeLocks.get(accountId)
  if (inFlight) return inFlight
 
  const refreshPromise = (async () => {
    try {
      const account = await db.getIntegratedAccount(accountId)
      const credentials = resolveCredentials(account)
      const newToken = await oauthClient.refreshToken({
        refreshToken: account.refreshToken,
        clientId: credentials.clientId,
        clientSecret: credentials.clientSecret,
        scope: account.grantedScopes,
      })
 
      if (!newToken.access_token || newToken.error) {
        throw new Error(
          `Refresh failed: ${newToken.error ?? 'no access_token in response'}`
        )
      }
 
      await db.updateToken(accountId, {
        accessToken: newToken.access_token,
        refreshToken: newToken.refresh_token ?? account.refreshToken,
        expiresAt: newToken.expires_at,
      })
 
      // Reactivate if account was previously in needs_reauth
      if (account.status === 'needs_reauth') {
        await db.updateStatus(accountId, 'active')
        await webhooks.emit('integrated_account:reactivated', { accountId })
      }
 
      await scheduleTokenRefresh({
        ...account,
        tokenExpiresAt: newToken.expires_at,
      })
      return newToken
    } catch (err) {
      await handleRefreshFailure(accountId, err)
      throw err
    } finally {
      activeLocks.delete(accountId)
    }
  })()
 
  activeLocks.set(accountId, refreshPromise)
  return refreshPromise
}

The mutex key is the integrated account ID. Two refreshes for the same account are serialized - the second caller awaits the first caller's result without making a duplicate HTTP request. Two refreshes for different accounts run independently and in parallel. This prevents the race condition where concurrent calls invalidate each other's refresh tokens while avoiding cross-tenant blocking.

Pre-call token validation with a 30-second safety margin:

async function ensureFreshToken(account: IntegratedAccount): Promise<string> {
  const expiresAt = new Date(account.tokenExpiresAt).getTime()
  const safetyMarginMs = 30 * 1000
 
  if (Date.now() + safetyMarginMs >= expiresAt) {
    const newToken = await refreshWithMutex(account.id)
    return newToken.access_token
  }
  return account.accessToken
}

The 30-second safety margin ensures that tokens about to expire are refreshed before the API call starts, not during it. Without this buffer, a request could begin with a valid token and arrive at the provider after it has expired.

Handling Rate Limits the Right Way

Third-party vendor APIs have terrible, inconsistent rate limits. When your downstream API returns an HTTP 429 Too Many Requests, your MCP server has two honest choices: swallow it and retry, or surface it and let the caller decide.

The instinct of most backend engineers is to handle the retry and exponential backoff inside the middleware. For AI agents, this is an architectural mistake. Agents make burst patterns that look nothing like human traffic. If the middleware absorbs the 429 and stalls the HTTP connection for 30 seconds while it retries, the LLM waiting on the other side of the MCP connection will likely time out. Furthermore, a hidden retry storm inside your server will trip circuit breakers on the third-party API and get every tenant rate-limited.

The pragmatic pattern is to fail fast and normalize rate-limit signals, passing the error through. If the agent knows the API is exhausted, it can pivot its reasoning strategy—perhaps falling back to a different tool, querying a local cache, or asking the user to wait. The middleware should normalize the chaotic vendor-specific rate limit headers into the IETF standard draft headers:

HTTP/1.1 429 Too Many Requests
ratelimit-limit: 100
ratelimit-remaining: 0
ratelimit-reset: 12

By passing these standardized headers back through the JSON-RPC response, the agent framework (or the developer writing the orchestration logic) reads ratelimit-reset and decides whether to retry, switch to another tool, or surface the delay to the user.

Tip

If you build retries into your MCP server, you remove the agent's ability to reason about cost and latency. A reasoning loop that thinks one tool call took 200ms when it actually took 14 seconds will make bad planning decisions. Honesty in errors beats clever absorption.

Token Revocation on Disconnect

When a user disconnects a third-party account - whether they are offboarding, revoking agent access, or switching providers - your MCP server must clean up every token associated with that connection. Leaving orphaned tokens in your database or at the provider is a liability. By promptly revoking tokens that are no longer needed or have been compromised, you can prevent unauthorized access to protected resources.

RFC 7009 defines an API for revoking tokens at the authorization server. The client sends the token it wishes to revoke in a POST request along with its credentials to the token revocation endpoint. The server validates the request, revokes the token, and responds with an empty body using HTTP status code 200.

Here is the full disconnect sequence:

async function disconnectAccount(accountId: string) {
  const account = await db.getIntegratedAccount(accountId)
 
  // 1. Revoke tokens at the provider (best effort)
  await revokeUpstreamTokens(account)
 
  // 2. Cancel all scheduled refresh jobs
  await scheduler.cancel('refresh_token', { accountId })
 
  // 3. Delete stored tokens from your database
  await db.deleteTokens(accountId)
 
  // 4. Update account status
  await db.updateStatus(accountId, 'disconnected')
 
  // 5. Emit webhook so downstream systems can react
  await webhooks.emit('integrated_account:disconnected', {
    accountId,
    provider: account.provider,
    userId: account.userId,
  })
}
 
async function revokeUpstreamTokens(account: IntegratedAccount) {
  const config = await getProviderConfig(account.provider)
  if (!config.revocationEndpoint) return
 
  for (const [token, hint] of [
    [account.refreshToken, 'refresh_token'],
    [account.accessToken, 'access_token'],
  ] as const) {
    if (!token) continue
    try {
      await fetch(config.revocationEndpoint, {
        method: 'POST',
        headers: {
          'Content-Type': 'application/x-www-form-urlencoded',
          Authorization: `Basic ${btoa(
            `${config.clientId}:${config.clientSecret}`
          )}`,
        },
        body: new URLSearchParams({ token, token_type_hint: hint }),
      })
    } catch (err) {
      logger.warn(`Token revocation failed for ${account.provider}`, err)
    }
  }
}

The response from a revocation endpoint is always HTTP 200 if the token is revoked or otherwise invalid - the client should disregard the body. This means your disconnect handler should treat any 200 as success and log non-200 responses without blocking the disconnect flow.

Warning

Not every SaaS provider supports RFC 7009. Google, Microsoft, Salesforce, and Slack all have revocation endpoints. Some smaller providers only invalidate tokens when the user revokes access through their own UI. For providers without a revocation endpoint, deleting the tokens from your database and canceling scheduled refreshes is the best you can do - the access token will expire naturally within its TTL.

Audit Fields to Tie API Calls to Human Delegation Chains

When an AI agent acts on behalf of a human user, every downstream API call must be traceable back to the human who authorized it. This is not optional for SOC 2 or GDPR compliance - it is the primary mechanism auditors use to verify that access was authorized and properly scoped.

Teams building structured delegation logging track records like "User A authorized Agent B to perform Action C on Resource D," stored alongside every API call as the audit trail that proves the delegation chain was intact.

What to Log on Every Tool Call

Every API call your MCP server proxies to a third-party SaaS should include these fields in its structured log entry:

interface McpAuditEntry {
  // Request identity
  request_id: string              // UUID for this specific API call
  correlation_id: string          // Groups all calls in a single agent reasoning chain
  timestamp: string               // ISO 8601
 
  // Human delegation chain
  user_id: string                 // The human who authorized the agent (JWT sub claim)
  agent_client_id: string         // The MCP client / agent identity
  session_id: string              // Agent conversation or session
 
  // Integration context
  integrated_account_id: string   // The connected third-party account
  provider: string                // e.g., "salesforce", "jira", "hubspot"
  environment_id: string          // Tenant or environment
 
  // Action performed
  tool_name: string               // e.g., "list_all_contacts", "create_opportunity"
  http_method: string             // Downstream HTTP method
  resource_path: string           // Downstream API path (redact sensitive params)
 
  // Outcome
  status_code: number             // Downstream response status
  error_type?: string             // Classified: "rate_limited", "auth_failed", "not_found"
  token_refreshed: boolean        // Whether this call triggered a token refresh
  duration_ms: number             // End-to-end latency including any refresh time
}

The correlation_id ties together a multi-step agent reasoning chain. When an agent calls list_opportunities, then get_contact, then create_note in sequence, all three calls share the same correlation_id. This lets your security team reconstruct exactly what an agent did during a single user interaction.

The user_id + agent_client_id pair is the delegation chain. Every log entry answers the question: "Which human authorized this, and which agent executed it?" If an agent takes an unexpected action, you can trace it back to the exact user session and the exact MCP client that performed the call.

The act Claim for Structured Delegation

An IETF draft (draft-oauth-ai-agents-on-behalf-of-user) extends the OAuth 2.0 Authorization Framework to enable AI agents to securely obtain access tokens for acting on behalf of users. It introduces the requested_actor parameter in authorization requests to identify the specific agent requiring delegation.

The resulting access token carries an act (actor) claim that names both the human subject and the agent:

{
  "sub": "user_abc123",
  "iss": "https://auth.yoursaas.com",
  "aud": "https://api.yoursaas.com/mcp",
  "act": {
    "sub": "agent:claude-session-xyz",
    "client_id": "mcp-client-9f8e7d"
  },
  "scope": "read:contacts read:deals",
  "exp": 1719600000,
  "iat": 1719596400
}

Tokens must carry delegation metadata that downstream systems can log, store, and query. The IETF draft supports this through fields like act (actor) and obo (on-behalf-of) claims, which enable structured logging and traceability across microservices or external APIs.

This is still an emerging standard. It is an Internet-Draft (revision -02, August 2025), not yet adopted by the OAuth working group, but the core ideas - front-channel consent, a named actor, an act claim - are likely to persist. Even before your IdP supports the act claim natively, you can implement equivalent traceability by extracting the sub from the JWT and recording the client_id of the MCP client alongside it in every audit log entry.

Dynamic Tool Generation vs. Static Endpoints

The other half of an MCP server is the tool list. If you want to expose a SaaS integration to an AI agent, the naive approach is to handcode tool definitions or feed the agent the entire OpenAPI specification for that vendor.

This fails immediately in production. A typical enterprise SaaS OpenAPI spec (like Salesforce or Microsoft Dynamics) contains thousands of endpoints. Feeding that into an LLM context window consumes massive amounts of tokens, increases latency, and guarantees hallucinations. The agent will attempt to call endpoints that the user's OAuth scopes do not permit, or it will invent query parameters that do not exist.

Furthermore, static tools rot. Your Salesforce custom objects change. Your Jira workflow adds a status. A vendor deprecates an endpoint. Every change means a code deploy on your MCP server, a new release for the AI client to discover, and a guaranteed window where the agent calls a tool that no longer exists.

Documentation-Driven Tool Derivation

The alternative is to derive tools dynamically from a documentation-driven manifest at request time, scoped specifically to the integrated account.

  1. The integration declares its resources and methods as data, not code.
  2. Each (resource, method) pair has a JSON Schema for query and body parameters plus a human-readable description.
  3. A tool only appears in the tools/list response if it has a documentation record. No docs, no tool.

This acts as a strict quality gate. It prevents half-built endpoints from leaking into agent context, and it gives PMs a clean lever to control which surface area is AI-ready.

When a tool is dynamically generated, it provides a strict JSON Schema to the LLM:

{
  "name": "list_all_hubspot_contacts",
  "description": "Fetch a list of contacts from HubSpot. Use the next_cursor from previous responses to paginate.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "limit": {
        "type": "string",
        "description": "The number of records to fetch"
      },
      "next_cursor": {
        "type": "string",
        "description": "Pass back exactly the cursor value you received without decoding or modifying it."
      }
    }
  }
}

Notice the explicit instruction regarding pagination. LLMs are notoriously bad at handling opaque cursor strings and love to "helpfully" decode them, breaking the pagination. The tool description must explicitly instruct the model to pass the cursor back unmodified.

For more on this pattern, see our walkthrough of generating MCP servers for your SaaS users.

Operational Runbook: Handling Token Failures and Incident Response

Token failures in agent-driven systems behave differently from token failures in human-driven systems. A human sees a "please re-authenticate" prompt and handles it. An agent hits a 401, retries, fails again, and potentially enters an error loop that burns rate limits and generates noise. Here is how to handle the most common failure modes.

Failure Mode Reference

Failure Detection Immediate Action Recovery
invalid_grant on refresh 401 from provider token endpoint with invalid_grant in error body Stop all retries. Mark account needs_reauth. Cancel scheduled refreshes. Fire authentication_error webhook. User must re-authenticate.
Refresh token revoked by user in provider UI Indistinguishable from invalid_grant at the protocol level Same as above Same - re-authentication is the only fix
Provider token endpoint 5xx HTTP 500+ from the token endpoint Log error. Schedule retry with exponential backoff (start at 3 hours). Auto-recovers when provider stabilizes. Keep existing token if not yet expired.
Thundering herd race condition Multiple concurrent invalid_grant errors for same account within seconds If you see this, your mutex is broken Fix the mutex implementation. Manually trigger a single refresh for affected accounts.
Token response missing access_token Response validation fails - no access_token field or error field present Treat as non-retryable auth failure. Mark needs_reauth. Investigate whether provider changed their token response format.
Expired token hits downstream API HTTP 401 from the third-party SaaS (not from your auth server) Trigger on-demand refresh via mutex. Retry the original API call once. If refresh also fails, fall through to needs_reauth flow.

Distinguishing Your 401s from Their 401s

A subtle but critical operational detail: when your MCP server receives a 401, you need to know whether it came from your authorization server (the MCP token is invalid) or from the downstream third-party API (the SaaS token is invalid). These require completely different responses.

Mark 401s from downstream APIs with a flag (e.g., is_remote_error: true) in your error handling middleware. If the 401 is from the downstream provider, trigger an on-demand refresh and retry. If it is from your own auth layer, return the 401 to the MCP client - the agent needs to re-authenticate with your authorization server.

Emergency Credential Revocation

When you detect a compromised tenant or a security incident, you need to revoke all tokens for that environment immediately. This is not a graceful disconnect - it is a kill switch.

async function emergencyRevocation(environmentId: string, reason: string) {
  const accounts = await db.listIntegratedAccounts({
    environmentId,
    status: 'active',
  })
 
  logger.critical(
    `Emergency revocation for env ${environmentId}: ${accounts.length} accounts`,
    { reason, environmentId, account_count: accounts.length }
  )
 
  const results = await Promise.allSettled(
    accounts.map(async (account) => {
      await scheduler.cancel('refresh_token', { accountId: account.id })
      await revokeUpstreamTokens(account)
      await db.deleteTokens(account.id)
      await db.updateStatus(account.id, 'revoked')
    })
  )
 
  const failed = results.filter((r) => r.status === 'rejected').length
 
  await webhooks.emit('environment:emergency_revocation', {
    environmentId,
    reason,
    total: accounts.length,
    succeeded: accounts.length - failed,
    failed,
  })
 
  if (failed > 0) {
    logger.error(
      `Emergency revocation incomplete: ${failed}/${accounts.length} failed`
    )
  }
}
Danger

Emergency revocation is destructive. Every affected user will need to re-authenticate their third-party accounts. Only trigger this for confirmed security incidents, not for routine token failures. Always log the reason field for post-incident review.

How Truto Solves MCP Server Authentication

Building this infrastructure from scratch—managing distributed locks for token refreshes, normalizing 429 headers across 100+ APIs, and dynamically generating JSON-RPC tools—is an engineering black hole. It pulls your team away from building your core product. For a single integration, it is reasonable to build this yourself. Once you cross the threshold of two or three providers per category, the math stops working.

Truto provides a production-ready unified API architecture for this exact problem. When a customer connects their third-party account (e.g., Salesforce, HubSpot, Jira) through Truto, the platform can automatically expose that specific connected account as an MCP-compatible tool server.

Here is how Truto handles the heavy lifting end-to-end:

  • Per-Tenant Isolation: Each connected account gets its own MCP endpoint at /mcp/<token>. The token cryptographically binds the server to one integrated account, so a request to one tenant's MCP URL physically cannot return data for another tenant.
  • Automated Token Lifecycles: Truto automatically manages the OAuth token lifecycle. Tokens are proactively refreshed on a randomized schedule (60-180 seconds before expires_at), serialized per account through a mutex lock to prevent thundering-herd race conditions.
  • Conditional Defense in Depth: Truto supports an optional require_api_token_auth flag on the MCP server, requiring the agent to present both the URL-embedded token and a valid platform API token in the Authorization header to execute tools.
  • Dynamic Tool Derivation: MCP servers in Truto are dynamically generated per-tenant based on integration documentation and schemas. Adding documentation for a new endpoint surfaces it as a tool on the next tools/list call. No deploy required.
  • Transparent Rate Limiting: Truto normalizes upstream rate limit info into standardized headers (ratelimit-limit, ratelimit-remaining, ratelimit-reset) per the IETF spec. When an upstream API returns HTTP 429, Truto passes that error directly to the caller, leaving retry and backoff strategy to whoever owns the agent loop.
  • Method and Tag Filters: A single connected account can produce multiple MCP servers. One can be scoped to read-only support tools, another to read/write directory tools, all backed by the same underlying credentials.

Where to Take This Next

The gap between "my agent works on my laptop" and "my agent passes a Fortune 500 security review" is mostly the work outlined above. If you are about to ship MCP for your platform, here are your immediate action items:

  1. Audit your current agent auth. If anything in production is a shared service-account API key, that is the first thing to retire.
  2. Wire up Protected Resource Metadata and audience-validated JWTs. This is the table-stakes work that makes external connectors actually function.
  3. Decide your DCR posture. If you want Claude Desktop users to self-connect, you need DCR or a control plane in front of your IdP.
  4. Move refresh off the request path. Proactive, mutex-protected refresh is the difference between an agent that pauses for OAuth latency mid-reasoning and one that operates smoothly.
  5. Stop hardcoding tools. Derive them from schema and documentation so your AI surface evolves with your API, not behind it.
  6. Implement token revocation on disconnect. Call the provider's RFC 7009 endpoint, delete stored tokens, and cancel scheduled refreshes - do not leave orphaned credentials.
  7. Log the delegation chain on every call. Record user_id, agent_client_id, correlation_id, and tool_name so your audit trail links every agent action back to a human authorization.

If you are building an AI agent that needs to act on external SaaS data, you cannot rely on shared API keys and static OpenAPI specs. You need a secure, multi-tenant middleware layer that respects end-user identity. The protocol is finally stable enough to build against. The remaining decisions are yours.

FAQ

What is end-user OAuth identity passthrough for AI agents?
It is an architectural pattern where an AI agent authenticates to external APIs using the specific OAuth credentials of the human user who invoked it, rather than a shared service account. The token is validated by an OAuth 2.1 authorization server, scoping the agent to the user's exact permissions.
Why shouldn't I give my AI agent a shared API key?
Shared API keys grant the agent aggregate privileges. If the agent hallucinates or is compromised via prompt injection, it can access or modify data across your entire application, leading to massive data leakage, privilege creep, and a failed enterprise security review.
Should my MCP server forward the user's access token to downstream APIs?
No. The MCP specification explicitly prohibits token passthrough because it creates confused deputy vulnerabilities. Your MCP server should validate the incoming token's audience claim, then use a separately stored OAuth token (scoped specifically for the downstream API) to make the actual third-party call.
How should an MCP server handle HTTP 429 rate-limit errors?
Pass them through to the caller rather than absorbing them with hidden retries. Normalize the response with standardized headers like ratelimit-limit, ratelimit-remaining, and ratelimit-reset per the IETF draft so the agent or orchestration layer can decide whether to back off, switch tools, or surface the delay to the user.
Why is feeding an OpenAPI spec to an AI agent a bad idea?
Enterprise OpenAPI specs are massive and contain thousands of endpoints. Feeding them directly into an LLM consumes excessive context window tokens, increases latency, and guarantees hallucinations where the agent tries to call undocumented or unauthorized endpoints. Tools should instead be derived dynamically from curated documentation.

More from our Blog