Demo Ideal Setup Models

This guide walks through an ideal workflow for selecting and comparing LLMs inside the app’s model browser. You’ll learn how to read model cards, compare token limits and pricing, switch providers, and decide when to use your own API keys or cloud-hosted LLMs.

Quick overview: model capabilities and limits

When you open the model browser you’ll see a provider selector (e.g., Anthropic, Google, xAI) and a list of models. Each model card typically shows:

Supported modalities (e.g., images)
Browser access (the extension performs in-extension web browsing when supported)
Prompt caching support
Max output token limit (context window / provider limit)
Pricing broken down by input tokens, cache writes, cache reads, and output tokens

Knowing these fields helps you pick the best trade-off between cost, speed, and capability.

A dark-themed app settings panel showing API provider and model selection, with "anthropic/claude-3.7-sonnet" highlighted among options like google/gemini-2.5-pro and x-ai/grok-3. The pane also displays model features and token pricing details.

Model comparison (consolidated)

Below is a concise comparison of the models shown in the examples. Use this table to quickly compare max output tokens and the four pricing dimensions so you can estimate cost for your workload.

Model	Max output	Input price	Cache writes price	Cache reads price	Output price	Notes
Claude 3.7 Sonnet	64,000 tokens	$3.00 / million tokens	$3.75 / million tokens	$0.30 / million tokens	$15.00 / million tokens	High-capability model for quality-sensitive tasks
Gemini 2.5 Pro	65,536 tokens	$1.25 / million tokens	$1.63 / million tokens	$0.31 / million tokens	$10.00 / million tokens	Good cost/quality balance for experimentation
Grok 3	Varies	Free (at time of capture)	N/A	N/A	N/A	Feature parity may vary by provider; may lack images/browsing/caching

Prices and features change frequently. Always verify token limits and pricing in the UI before running production jobs.

Feature parity and switching models

Many model cards list similar capabilities (images, browsing, prompt caching) and comparable token limits. For example, Gemini 2.5 Pro reports a slightly larger numeric max output (65,536 tokens) than some alternatives. A typical workflow:

Start with a lower-cost model (e.g., Gemini) for experimentation and iteration.
If quality or features are insufficient, switch to a higher-capability model (e.g., Claude 3.7 Sonnet or GPT-4 variants) for critical tasks.
Test the same prompt across models to measure quality differences vs cost.

Below is the editor/provider selection UI where you can pick providers and models:

A dark-themed code editor window with a left panel open showing an "API Provider" dropdown and a model selector highlighting "google/gemini-2.5-pro" plus its capabilities and pricing details. The rest of the screen is a large empty blue editor area.

Using your own API keys and cloud-hosted LLMs

If you have provider accounts, you can configure your own API keys in the extension to bill usage to your account. Common options:

Enter an API key and optionally set a custom base URL.
Keys are typically stored locally by the extension and used only to make API requests from your device.
Add cloud-hosted options (Amazon Bedrock, LLM Studio, Ollama) by supplying region and credentials where required.

Using your own key can avoid double-billing through the app and may be cheaper depending on your provider plan. Example UI excerpt (model + pricing summary) you may see when configuring a provider:

API Provider: Anthropic
Model: claude-opus-4-20250514
✓ Supports images
✓ Supports browser use
✓ Supports prompt caching
Max output: 8,192 tokens
Input price: $15.00 / million tokens
Cache writes price: $18.75 / million tokens
Cache reads price: $1.50 / million tokens
Output price: $75.00 / million tokens

(Your API key is stored locally and only used to make API requests from this extension.)

Protect your API keys: only enter keys you control, verify local storage behavior, and avoid sharing sensitive credentials. When using cloud providers, follow their recommended credential and region configuration processes.

Recommendations (practical checklist)

Start with a lower-cost model to iterate quickly; only escalate to higher-capability models when necessary.
Compare token limits and all four pricing dimensions (input, cache writes, cache reads, output) to estimate costs precisely.
Add your own provider API key if you have one — this can reduce cost and centralize billing.
Double-check feature availability (images, browsing, caching) in the UI before relying on a capability in production.
Run short A/B tests across candidate models to measure quality vs cost trade-offs for your prompts.

Links and references

LLM Studio: https://llm.studio
Ollama: https://ollama.com
Amazon Bedrock: https://aws.amazon.com/bedrock/
Provider documentation (general): OpenAI, Anthropic, Google Cloud documentation

That covers how to read model cards, compare pricing and token limits, switch providers, and decide when to bring your own keys or use cloud-hosted LLMs.

​Quick overview: model capabilities and limits

​Model comparison (consolidated)

​Feature parity and switching models

​Using your own API keys and cloud-hosted LLMs

​Recommendations (practical checklist)

​Links and references

Watch Video

Quick overview: model capabilities and limits

Model comparison (consolidated)

Feature parity and switching models

Using your own API keys and cloud-hosted LLMs

Recommendations (practical checklist)

Links and references