GuidesBYOK

Bring Your Own Key is how Mesrai uses LLMs by default on every plan — Free, Pro, and Enterprise. You connect your own provider account, choose any compatible model, and the token bill lands on your provider’s dashboard, not ours. Mesrai never adds a margin to tokens and never stores your key in plain text.

How this maps to plans

BYOK is free and unlimited on Free, ₹499 / active dev / month on Pro (unlimited reviews), and one of two modes on Enterprise (the other being AI Included with a Mesrai-managed model).

Getting started

The BYOK settings page gives you two paths: pick a recommended model from the curated catalog (fastest, covers most cases), or configure any provider manually (used for custom endpoints or models we haven’t benchmarked).

  1. 1

    Open BYOK Settings

  2. 2

    Pick a recommended model

    The Main model section shows a grid of curated models we’ve benchmarked for code review. Click any card to start connecting it.

  3. 3

    Paste your API key and test

    Each card expands inline with a single input — just the API key. Click Test to probe the provider, or Test & save to run the test and persist the config on success.

  4. 4

    Add a Fallback (recommended)

    Once the Main model is configured, a Fallback model section appears. If your main provider hits rate limits or goes down, Mesrai falls back automatically.

Test before saving. The Test button probes your provider with a cheap metadata call (no LLM inference is performed). It catches invalid keys, wrong base URLs, and network issues before they break your first code review.

These six models are curated for code review. They all appear in the catalog on /organization/byok and come pre-tuned with sensible defaults (temperature, max output tokens, and reasoning effort set to medium).

Claude Sonnet 4.6

Best balance of quality and cost

Anthropic’s latest Sonnet. Adaptive extended thinking, strong cross-file analysis, 200K context window.

Claude Opus 4.7

Flagship quality

Top-tier Anthropic model for the hardest reviews. 1M context, premium price.

Gemini 3.1 Pro (custom tools)

Largest context

Google’s flagship with custom-tools support. 1M context window — strongest on large PRs and monorepos.

GPT-5.4

Fast and consistent

OpenAI’s latest flagship. Reliable low latency, broad knowledge, 400K context.

Kimi K2.6 Coding

Coding-specialized, cheap

Moonshot AI’s coding-tuned model. Two plans: Developer API (pay-per-token) or Kimi Code Plan (subscription with dedicated endpoint).

GLM 5.1

Best subscription value

Z.ai’s latest. Two plans: Developer API (pay-per-token) or GLM Coding Plan (flat-rate subscription).

Our default recommendation: Start with Claude Sonnet 4.6 for the best overall code-review experience. If cost is the priority, GLM 5.1 on the Coding Plan or Kimi K2.6 on the Kimi Code Plan give flat-rate subscriptions that cap your monthly spend.

Plan selector (GLM 5.1 and Kimi K2.6)

Z.ai and Moonshot both offer a subscription plan with a different endpoint than their pay-per-token Developer API. The curated card for each of these models shows a Plan selector so you can pick the right endpoint before pasting your key.

PlanEndpointKeys fromBest for
Developer APIhttps://api.z.ai/api/paas/v4/z.ai/manage-apikeyBursty workloads, pay-per-token
Coding Planhttps://api.z.ai/api/coding/paas/v4z.ai/subscribePredictable team volume, flat monthly fee

GLM Coding Plan keys only work on /api/coding/paas/v4. The Lite and Pro tiers are often capped at 1 concurrent request — Mesrai pre-fills maxConcurrentRequests=1 when you pick this plan. Bump it in Advanced settings if you’re on the Max tier (up to 30).

Configure manually

When the model you want isn’t in the curated list (custom endpoint, self-hosted LLM, or a provider we haven’t benchmarked), click Configure manually at the bottom of the catalog. This opens /organization/byok/manual?slot=main — a step-by-step wizard:

  1. 1

    Pick a provider

    Choose from OpenAI, Anthropic, Google Gemini, OpenRouter, Novita, or OpenAI Compatible (for any OpenAI-format endpoint).

  2. 2

    Enter the base URL (if required)

    OpenAI-compatible providers need an explicit base URL. The field only appears when you pick that provider.

  3. 3

    Pick or type the model ID

    If Mesrai can list models from the provider, you get a dropdown. Otherwise (e.g. self-hosted or when platform keys aren’t configured), type the exact model ID manually.

  4. 4

    Paste the API key

    The key field appears once provider and model are set.

  5. 5

    Tune advanced settings (optional)

    Temperature, max tokens, reasoning effort, and max concurrent requests — all optional. Defaults are sensible for most providers.

  6. 6

    Test and save

    Click Test & save to run the connection probe and persist on success.

The same manual route works for Fallback — navigate with ?slot=fallback, or use the Add fallback link after Main is saved.

Supported Providers

Best for: Latest GPT models and reliable performance.

Get an API key:

  1. Visit OpenAI API Keys
  2. Create a new key for Mesrai
  3. Add billing information

Reasoning / Extended Thinking

All six recommended models support reasoning. The BYOK form exposes a Thinking toggle (Off / Low / Medium / High / Custom) under Advanced settings, pre-filled to Medium for every recommended model.

Preset levels

When you pick Low / Medium / High, Mesrai translates the level to each provider’s native format automatically:

ProviderHow “medium” maps
Anthropic (Claude Sonnet 4.6 / Opus 4.7)thinking: { type: "adaptive" } + outputConfig: { effort: "medium" }
Google (Gemini 3.1 Pro)thinkingConfig: { thinkingLevel: "medium" }
OpenAI (GPT-5.4)reasoningEffort: "medium"
OpenRouterreasoning: { effort: "medium" }
OpenAI-compatible (Kimi K2.6 / GLM 5.1)thinking: { type: "enabled" } — binary on/off, level ignored

Kimi and GLM currently expose reasoning as a single on/off flag. Picking Low, Medium, or High all emit the same payload (thinking enabled). When their APIs add level granularity, Mesrai will start forwarding it.

Custom JSON override

Picking Custom in the Thinking toggle reveals a JSON textarea. Paste the provider options directly — Mesrai auto-wraps them under the active provider’s namespace. You don’t need to know the Vercel AI SDK routing rules.

Use this when:

  • You need a specific budgetTokens value for Claude (instead of the preset effort mapping)
  • You want to enable/disable thinking on a per-model basis for OpenAI-compatible providers
  • You want fields beyond reasoning — caching, service tier, safety settings, user tagging, etc. The override is merged into providerOptions, so any adapter field passes through
  • The provider ships a new field Mesrai hasn’t wrapped yet

Examples (paste directly — no namespace needed)

Override Claude’s thinking budget to exactly 20,000 tokens:

{
  "thinking": { "type": "enabled", "budgetTokens": 20000 }
}

Enable prompt caching (non-reasoning example):

{
  "cacheControl": { "type": "ephemeral" }
}

Going manual with namespaces (power users)

If your JSON already contains a known namespace key at the top level (anthropic, google, openai, openrouter, openaiCompatible), Mesrai leaves it untouched. Useful if you want to mix multiple provider namespaces or be explicit:

{
  "openrouter": {
    "reasoning": { "effort": "high" },
    "provider": { "order": ["moonshot"], "allow_fallbacks": false }
  }
}

Under the hood, these are the namespace mappings Mesrai uses:

BYOK providerNamespace key
anthropicanthropic
google_gemini / google_vertexgoogle
openaiopenai
open_routeropenrouter
openai_compatible / novitaopenaiCompatible

Gotchas

  • Valid JSON only. Missing commas or trailing commas break the parse and Mesrai ignores the override.
  • Precedence: the JSON override fully replaces the effort-preset’s namespace block — if you override anthropic.thinking but forget anthropic.outputConfig, that field won’t be sent. OpenRouter routing (Pin providers / Allow fallbacks) is the one exception: it deep-merges with your override under openrouter.
  • Unknown provider = no wrap. If your BYOK provider isn’t in the namespace table above, Mesrai passes the JSON through as-is. Rare — only applies if you configure a provider Mesrai doesn’t recognize.

Pinning OpenRouter providers

OpenRouter is a router — when you request a model (e.g. moonshotai/kimi-k2.5), it forwards the call to one of several upstream providers (Moonshot direct, Together, Groq, Fireworks, Novita…). Each call can land on a different backend. That’s convenient, but it introduces silent variance:

  • Quality drift — upstreams run different precisions (FP8, INT4, full) and give subtly different outputs for identical prompts
  • Tool-calling inconsistency — some backends don’t support function calling the same way, leading to malformed tool use
  • Reasoning format variance — one upstream honors reasoning_effort, another only thinking.enabled, another ignores both
  • Latency swings — p50 can jump from 800ms to 4s between calls as routing changes
  • Rate-limit surprises — you hit quota on a backend you didn’t explicitly choose

How to pin

When your BYOK provider is OpenRouter, the Advanced settings panel shows an OpenRouter routing section with two fields:

  • Pin providers (in order) — comma-separated list of upstream names (e.g. moonshot, together). OpenRouter tries them in order and uses the first available.
  • Allow fallbacks — when off, requests hard-fail if none of the pinned providers are available. When on (default), OpenRouter can fall back to any other upstream that serves the model.

For a stable setup, pin a single provider and turn off fallbacks (Pin: moonshot, Allow fallbacks: off). Requests will always hit the same upstream or fail loudly — no silent quality changes. The tradeoff is zero resilience if that one upstream goes down; pair it with a different BYOK Fallback (e.g. Anthropic) to absorb outages.

Upstream names must match OpenRouter’s catalog. Check the provider tags on openrouter.ai/docs/features/provider-routing — common values include moonshot, together, groq, fireworks, novita.

Under the hood, Mesrai emits this into the Vercel AI SDK call:

{
  "openrouter": {
    "provider": {
      "order": ["moonshot", "together"],
      "allow_fallbacks": false
    }
  }
}

Advanced: raw JSON override

If you need fields beyond order and allow_fallbacks (e.g. ignore, data_collection, require_parameters), switch Thinking to Custom in Advanced settings and paste the full routing payload — it’s merged into providerOptions alongside any reasoning config:

{
  "openrouter": {
    "provider": {
      "order": ["moonshot"],
      "allow_fallbacks": false,
      "ignore": ["deepinfra"],
      "data_collection": "deny"
    },
    "reasoning": { "effort": "medium" }
  }
}

Concurrency and rate limits

The maxConcurrentRequests field (under Advanced settings) caps how many inflight requests Mesrai sends to your provider in parallel. Most of the time, the default is fine — but subscription plans with strict concurrency caps need it set explicitly.

Defaults Mesrai pre-fills

Provider / planPre-filled valueWhy
GLM Coding Plan (Lite/Pro)1Subscription allows only one in-flight request. Going higher triggers 429s.
GLM Coding Plan (Max)1 (bump manually)Max allows up to 30, but we default to the safe value. Raise in Advanced settings.
Kimi Code Plan30Moonshot’s documented cap on the coding endpoint.
GLM Developer API(empty)Limits scale per key; no sensible global default.
Kimi Developer API(empty)Scales with your recharge tier (Tier 1 ≈ 50, Tier 5 ≈ 1000).
Anthropic / OpenAI / Google / OpenRouter(empty)Providers enforce their own TPM/RPM; Mesrai doesn’t cap.

When to tune it

Raise it

  • You have a high-tier recharge on Moonshot/OpenRouter and want higher throughput on big PRs
  • You bumped your GLM Coding Plan to Max and want to use the full 30-concurrent budget
  • Reviews feel serialized on multi-file PRs and you’re not seeing 429s

Lower it

  • You see 429 or Too much concurrency errors in review logs
  • Your provider warns about rate limits on the dashboard
  • You want to conserve Coding Plan window (5h/weekly) across more PRs

Concurrency vs. RPM vs. TPM. maxConcurrentRequests only caps parallel inflight requests. Many providers also enforce separate RPM (requests per minute) and TPM (tokens per minute) limits. If you’re hitting RPM/TPM while concurrency looks fine, the fix is usually to upgrade your tier or spread load across time — not to change maxConcurrentRequests.

Fallback interaction. When Main hits a 429 and Mesrai fails over to the Fallback model, the Fallback’s own maxConcurrentRequests applies. Setting a generous Fallback on a different provider is a good way to absorb bursts when your Main is on a tight subscription.

Best Practices

Security

Dedicated Keys

Create separate API keys for Mesrai. Makes usage auditing and key rotation easier.

Regular Rotation

Rotate keys periodically and update them in BYOK settings.

Monitor Usage

Check your provider dashboards for unusual patterns.

Secure Storage

Never commit keys to repositories. Mesrai stores them encrypted at rest and in transit.

Fallback Strategy

  • Use a different provider for Main and Fallback (e.g. Anthropic main, Google fallback). Protects against provider-specific outages.
  • Subscriptions with tight concurrency limits (GLM Coding Plan Lite/Pro, Kimi Code Plan) make poor solo configurations — pair them with a pay-per-token Fallback so bursty PRs don’t starve.

Troubleshooting

'Invalid API key' when clicking Test
  • Copy the key without extra spaces, quotes, or trailing newlines.
  • Confirm billing is enabled and the account has credits.
  • For GLM Coding Plan / Kimi Code Plan keys, make sure you picked the matching Plan in the card — subscription keys don’t work on the Developer API endpoint and vice versa.
'Endpoint not found' when clicking Test
  • Verify the base URL matches the provider exactly (trailing slash matters for some).
  • For OpenAI-compatible providers, the models endpoint is usually {baseURL}/models.
Model not found at review time (key test passed)
  • The Test button validates the key/endpoint but doesn’t verify the specific model ID. If you typed a model that doesn’t exist (typo), the first real review fails.
  • Cross-check the model ID against the provider’s catalog before saving.
'Rate limited' or 'Too much concurrency'
  • Lower Max concurrent requests in Advanced settings.
  • On GLM Coding Plan Lite/Pro, stay at 1 concurrent. Upgrade to Max (30 concurrent) if you need more throughput.
  • On Kimi Code Plan, the documented cap is 30 concurrent.
Self-hosted env vars not showing
  • If Mesrai is configured via .env (self-hosted Fixed Mode), the BYOK screen shows a blue info banner with the active provider/model — the key is never displayed for security.
  • Saving a BYOK config on top of .env prompts a confirm dialog before overriding.
High or unexpected costs
  • Reasoning adds tokens. If cost is spiking, lower Thinking from Medium to Low, or switch to a cheaper model for Main.
  • Check your provider dashboard for the per-model breakdown.
  • Set a monthly cap at the provider level.

Frequently Asked Questions

Can I switch providers anytime?

Yes. The change takes effect for the next review — no redeploy required.

What happens if my API key runs out of credits?

Reviews automatically switch to the Fallback model if one is configured. Without a Fallback, the review fails and returns an error. Always configure a Fallback.

How does the primary/fallback system work?

Main handles every review by default. If it fails (rate limit, 5xx, timeout), Mesrai retries once on Fallback. You pay only for the provider that actually processed the review.

Should I use the same provider for Main and Fallback?

No. Different providers protect against provider-specific outages. A common pairing: Anthropic Main + Google Fallback, or GLM Coding Plan Main + Anthropic Fallback for spike coverage.

Do you store our API keys securely?

Yes. Keys are encrypted at rest and in transit and never logged in plain text. The BYOK status endpoint never returns the raw key.

Can I use a self-hosted LLM (e.g. Ollama, vLLM)?

Yes — via the OpenAI Compatible provider in the manual wizard. Enter your endpoint’s base URL, the model ID it exposes, and a placeholder API key (most self-hosted runtimes ignore the key header but still require one).