The open-source LLM landscape
Duration: 10 min Prerequisites: chapter 04 (you’ve understood what an LLM is and how to drive it)
Key idea
Section titled “Key idea”“Open-source LLM” is not a single model — it’s a galaxy of families, each with its publisher, licence, architecture and specialty. What distinguishes these models is not the tooling (the tooling is Ollama / LangChain / vLLM), it’s: the licence, the architecture (Dense vs MoE), the modality (text, image, audio, code), and the quality of the initial fine-tuning.
“Is Mistral the best?” Honest answer: there isn’t one best. Mistral excels at multilingual and European MoE; Meta (Llama) remains the reference for reliable tool calling; Alibaba (Qwen) dominates code and multimodal; DeepSeek has become unavoidable for reasoning. It depends on your task. The rest of this chapter gives you the keys to choose.
The 8 major open-source families (May 2026)
Section titled “The 8 major open-source families (May 2026)”| Publisher | Model family | Country / lab | Headline innovation | Licence |
|---|---|---|---|---|
| Meta | Llama (2 → 3.1 → 3.2 → 3.3 → 4) | USA | Native tool calling since 3.1; Llama 3.2 Vision (multimodal); Llama 4 MoE | Llama Community License (open with restrictions) |
| Mistral AI | Mistral / Mixtral / Codestral / Pixtral / Ministral | France | MoE (Mixtral 8x7B, 8x22B), European multilingual, Pixtral 12B (image) | Apache 2.0 (most) |
| Google DeepMind | Gemma (1 → 2 → 3) | USA | Very efficient small models, multimodal Gemma 3 | Gemma Terms (open but conditional) |
| Alibaba | Qwen (2 → 2.5 → 3), Qwen-Coder, Qwen-VL, QwQ | China | Qwen2.5-Coder (top on code), Qwen-VL (vision), QwQ (reasoning) | Apache 2.0 |
| DeepSeek | DeepSeek-V3, DeepSeek-R1, DeepSeek-Coder | China | DeepSeek-R1 (open “think” reasoning), MoE 671B (37B active) | MIT (very permissive) |
| Microsoft | Phi (3, 3.5, 4) | USA | Small models (≤ 14B) that beat 5× bigger models on light reasoning | MIT |
| xAI | Grok (1 → 1.5+) | USA | Grok-1 = 314B MoE open (largest open weights at release) | Apache 2.0 (Grok-1) |
| AllenAI | OLMo (1 → 2) | USA (academic) | “Truly” open: weights + data + training recipes | Apache 2.0 |
What about IBM and Cohere? IBM Granite (Apache 2.0, enterprise-oriented) and Cohere Command (some sizes open, the rest commercial) are serious outsiders but less common in the classroom. Honourable mention for Stability AI (StableLM) and TII (Falcon).
Hugging Face: the hub, not a publisher
Section titled “Hugging Face: the hub, not a publisher”Hugging Face does not train its own flagship LLMs (apart from
HuggingChat and a few internal projects). It is a platform:
- over 1 million models hosted (the “12 000” figure you’ll see in some courses has been outdated since ~2023);
- standard file format (
safetensors,gguf); - the
transformerslibrary that everyone uses; Datasets,Spaces(one-click deploy), leaderboards (which ranks models on which metric).
For this course: you download models via Ollama (which uses its own mirror), but when you want to fine-tune (chapter 14), you go through Hugging Face. The two worlds coexist.
Yes, there are LLMs for images. And audio. And code.
Section titled “Yes, there are LLMs for images. And audio. And code.”This is probably the most common confusion. Let’s clear it up.
Image-IN — Vision-Language Models (VLM)
Section titled “Image-IN — Vision-Language Models (VLM)”You give an image, the model describes what it sees or answers a question about it.
| VLM | Publisher | Size | Available in Ollama |
|---|---|---|---|
| Llama 3.2 Vision | Meta | 11B / 90B | ollama pull llama3.2-vision |
| Pixtral | Mistral | 12B | ollama pull pixtral (when available) |
| Qwen 2-VL / 2.5-VL | Alibaba | 2B / 7B / 72B | ollama pull qwen2.5vl |
| Gemma 3 | 4B / 12B / 27B | ollama pull gemma3 | |
| MiniCPM-V | OpenBMB | 8B | ollama pull minicpm-v |
from ollama import Clientclient = Client()resp = client.chat( model="llama3.2-vision", messages=[{"role": "user", "content": "What does this image show?", "images": ["./photo.jpg"]}],)print(resp["message"]["content"])Image-OUT — these are NOT LLMs
Section titled “Image-OUT — these are NOT LLMs”Stable Diffusion, FLUX, DALL·E, Imagen 3, Midjourney: these are
diffusion models, not LLMs. Different architecture, different
library (diffusers, ComfyUI). They’re often confused because they’re
also “generative” and “AI”, but under the hood there’s no relation.
Audio-IN — Speech-to-Text
Section titled “Audio-IN — Speech-to-Text”| ASR | Publisher | Note |
|---|---|---|
| Whisper | OpenAI (open!) | De-facto standard, multilingual |
| Distil-Whisper | HF | 6× faster |
| Qwen2-Audio | Alibaba | Chat by speaking, not just transcription |
| Voxtral | Mistral | Audio-first (announced 2025) |
Audio-OUT — Text-to-Speech
Section titled “Audio-OUT — Text-to-Speech”Coqui XTTS, F5-TTS, OuteTTS — these are also not LLMs (dedicated audio architecture), but they are open source and usable locally.
“Code” models are normal LLMs heavily fine-tuned on code. That’s why we have a dedicated chapter on model choice (05b) — for our Java demo, we go with Llama 3.1 (reliable tool calling) or Qwen2.5-Coder (clean Java).
| Code model | Publisher | Specialty |
|---|---|---|
| Qwen2.5-Coder (1.5B → 32B) | Alibaba | Most versatile and accurate across 40+ languages |
| Codestral (22B) | Mistral | Very strong on C/C++/Python, specific commercial licence |
| DeepSeek-Coder (6.7B → 33B) | DeepSeek | Strong on Python/JS, fully free (MIT) |
| StarCoder2 (3B / 7B / 15B) | BigCode (HF + ServiceNow) | Trained on The Stack v2, transparent about data |
Reasoning (“thinking models”)
Section titled “Reasoning (“thinking models”)”Recent generation (late 2024 → 2026): models that generate an explicit chain of thought before answering.
| Model | Publisher |
|---|---|
| DeepSeek-R1 + its distill models (Llama-8B-R1, Qwen-7B-R1, …) | DeepSeek |
| QwQ-32B | Alibaba |
| Phi-4 reasoning | Microsoft |
Cost: these models are slow (5× to 20× more tokens generated because of the visible reasoning) but much better at maths/logic.
Embeddings (for RAG)
Section titled “Embeddings (for RAG)”Not generative, but central to RAG:
nomic-embed-text(Apache 2.0)bge-m3(multilingual, all-in-one)mxbai-embed-large
ollama pull nomic-embed-textWhat distinguishes open-source models from each other
Section titled “What distinguishes open-source models from each other”Not the tooling — the tooling (Ollama, vLLM, llama.cpp) is shared by all. The real differences, in practical order of importance:
| Axis | Concrete consequence |
|---|---|
| Licence | Can you use it commercially / redistribute it / embed it in a product? Apache 2.0 / MIT = free. Llama licence = restrictions on very large usage. Gemma = conditional. Always read the LICENSE before a professional project. |
| Architecture | Dense (Llama 3.1 8B = 8B all-active parameters) or MoE (Mixtral 8x7B = 46B total, but 13B active per token → speed of a 13B, quality of a 46B). MoE = better quality/speed ratio, but needs lots of RAM. |
| Modality | Text-only / + image (VLM) / + audio / + specialised code. No “universal” model — each modality has a cost. |
| Native tool calling | This is fine-tuned into the model. Llama 3.1+ and Qwen 2.5+ have structured tool_calls. Phi, Gemma 2, Mistral 7B need a fallback parser. See chapter 05b. |
| Specialty | Code / reasoning / multilingual / instruction-following / chat. A “code” model is just a text model heavily fine-tuned on code. |
| Size | 1B / 3B / 7B / 8B / 14B / 32B / 70B / 100B+. Bigger reasons better, but slower and hungrier on RAM/VRAM. |
| Quantization | The same model exists in FP16 (~2 bytes/weight), Q8 (~1 byte), Q4 (~0.5 byte). Q4_K_M = good compromise for Ollama. |
| Training data | Influences what it “knows”: The Stack for StarCoder, filtered Common Crawl, GitHub code, etc. Most do not say precisely. OLMo is the exception (everything is public). |
| Knowledge cutoff | Llama 3.1 = December 2023. Llama 3.3 = mid-2024. Llama 4 = end of 2024. Beyond that, the model does not know. RAG is needed for anything that changes. |
Decision tree: which model to pick?
Section titled “Decision tree: which model to pick?”
flowchart TD
Start([What is your need]) --> Modal{Which modality}
Modal -->|Text only| Task{Task type}
Modal -->|Text plus image| VLM[/Vision-Language Models<br/>Llama 3.2 Vision<br/>Pixtral 12B<br/>Qwen2.5-VL<br/>Gemma 3/]
Modal -->|Audio to text| ASR[/Speech-to-Text<br/>Whisper<br/>Distil-Whisper/]
Modal -->|Code mostly| Code[/Code-specialised<br/>Qwen2.5-Coder<br/>Codestral<br/>DeepSeek-Coder/]
Modal -->|Image to image| Diff([Diffusion models<br/>Stable Diffusion / FLUX<br/>Out of LLM scope])
Task -->|Reasoning<br/>math, logic, multi-step| Reason[/Thinking models<br/>DeepSeek-R1<br/>QwQ-32B<br/>Phi-4-reasoning/]
Task -->|Multilingual<br/>French important| Multi[/Mistral Small or Large<br/>Llama 3.x<br/>Qwen 2.5/]
Task -->|Reliable tool calling<br/>for an agent| Agents[/Llama 3.1 plus 8B<br/>Qwen 2.5 7B plus<br/>Mistral Large/]
Task -->|General conversation| HW{Which machine}
HW -->|8 GB RAM or less<br/>modest laptop| Small[/Small models<br/>Llama 3.2:3b<br/>Gemma 2:2b<br/>Phi-3 mini/]
HW -->|16 GB RAM<br/>solid laptop| Mid[/Mid-range<br/>Llama 3.1:8b<br/>Qwen 2.5:7b<br/>Mistral 7B/]
HW -->|Dedicated GPU 24 GB VRAM or more| Big[/Big models<br/>Llama 3.x:70b<br/>Mixtral 8x22B<br/>DeepSeek-V3/]
Small --> License{Commercial use}
Mid --> License
Big --> License
Agents --> License
Multi --> License
Reason --> License
VLM --> License
Code --> License
License -->|Yes unrestricted| FreeLic[/Prefer Apache 2.0 or MIT<br/>Qwen, Mistral OSS,<br/>DeepSeek, Phi, OLMo/]
License -->|No internal only| AnyLic([Any open model works])
How to read the tree: start at the top, follow your constraints modality → task → hardware → licence. You’ll end up with 3 or 4 candidates. Test, measure, choose. Chapter 05b helps you measure; the demo 2 comparator lets you pit two or three models head-to-head.
”OK, but as of May 2026, who wins?”
Section titled “”OK, but as of May 2026, who wins?””Honest answer: nobody wins everywhere. Here’s who dominates each dimension, based on leaderboards and the ecosystem:
| Dimension | Dominant family (May 2026) | Why |
|---|---|---|
| Reliable local tool calling | Meta (Llama 3.x, Llama 4) | Native tool_calls format since Llama 3.1, mature ecosystem. |
| General-purpose code | Alibaba (Qwen2.5-Coder) | Covers 40+ languages, remarkable Java/Python/Go quality. |
| Reasoning | DeepSeek (R1 and its distillations) | Redefined what’s expected of an open reasoning model. |
| European multilingual | Mistral AI | French remains their core market, plus the Mixtral MoE. |
| Efficient small models (≤ 4B) | Microsoft (Phi-4) + Google (Gemma 3) | Quality / VRAM ratio unbeatable. |
| Multimodal (image + text) | Google (Gemma 3) + Meta (Llama Vision) + Alibaba (Qwen-VL) | Three families converging, hard to rank. |
| Truly open (weights + data + recipes) | AllenAI (OLMo 2) | The only one to publish everything. The standard for academic research. |
| Largest open weights | xAI (Grok) or DeepSeek-V3 | 314B MoE / 671B MoE — very few users have the hardware to run them locally. |
Mistral is NOT necessarily the best at everything. It’s the best at multilingual + MoE + French, which makes it the logical choice for a course in France or Quebec. Elsewhere, the answer changes.
The leaderboard trap
Section titled “The leaderboard trap”When you read a benchmark saying “model X beats model Y by 2 points on MMLU”, be careful:
- Date: a benchmark published 6 months ago is already stale.
- Metric: MMLU, HumanEval, GSM8K, MT-Bench, Arena-Hard… do not measure the same thing. A model can dominate MMLU and flop on real agent tasks.
- Contamination: some families saw the test sets during training. It’s accidental cheating but very real.
- Inference: a model that wins in FP16 on an H100 can collapse in Q4 on your laptop.
The only benchmark that really matters: your own prompts, on your own machine, on your own task. Demo 2 lets you do this in 30 seconds.
Key takeaways
Section titled “Key takeaways”- “Open-source LLM” = about 8 publisher families, not a single model. Meta, Mistral, Google, Alibaba, DeepSeek, Microsoft, xAI, AllenAI are the May 2026 pillars.
- Mistral isn’t best everywhere: it dominates European multilingual and MoE, period. For tool calling we pick Llama, for code Qwen-Coder, for reasoning DeepSeek-R1.
- Hugging Face is not a publisher, it’s the hub where everyone uploads (over a million models today).
- Yes, there are LLMs for images (Llama 3.2 Vision, Pixtral, Qwen-VL, Gemma 3). But image generators (Stable Diffusion, FLUX) are not LLMs.
- What distinguishes models is NOT the tooling (shared by all via Ollama / vLLM / llama.cpp). The real axes: licence, architecture (Dense/MoE), modality, native tool-calling, specialty, size, quantization, data.
- How to choose: follow the decision tree (modality → task → hardware → licence), keep 3 candidates, measure them with demo 2 on your prompts. Chapter 05b walks you through that final step.