Types of LLMs
Large language models come in several architectural and access pattern variants.
By Access Pattern
Proprietary API Models
Hosted by companies like OpenAI, Anthropic, and Google. You call them via REST API, pay per token, and never see the weights. Examples: GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro.
Pros: Highest capability, no infrastructure, always up to date.
Cons: Data leaves your systems, per-token cost, rate limits.
Open-Weight Models
Weights are publicly released. You download them and run inference yourself. Examples: Llama 3.1, Mistral 7B, Qwen 2.5.
Pros: Data stays local, no per-token cost, customizable.
Cons: Requires GPU/CPU infrastructure, operational overhead.
Fine-Tuned Models
A base model further trained on domain-specific data. Can be proprietary or open-weight. Examples: Code Llama (code), BioMedLM (biomedical).
By Size Class
| Class | Parameters | Typical Use Case |
|---|---|---|
| Small | 1B–7B | Edge inference, classification |
| Medium | 8B–30B | Chat, summarization, RAG |
| Large | 70B–200B | Complex reasoning, agentic tasks |
| Frontier | >200B | Research, hardest tasks |
Choosing the Right Model
For most production applications, start with a medium-class API model (GPT-4o-mini, Claude 3.5 Haiku) and measure accuracy on your specific task. Only upgrade to a larger model if the benchmark results justify the cost difference.