AI Glossary 2026: 50 Terms Every Beginner Should Know
Plain-English definitions for the AI terms that actually matter in 2026 — context windows, agents, RAG, MoE, OSWorld, and dozens more, grouped by category.
Why this glossary
The AI industry invents a new piece of jargon every week. Most of it is marketing. Some of it is genuinely useful for understanding what’s happening when you use these tools. This glossary covers 50 terms that actually matter in May 2026 — grouped by category, defined in plain English, with examples.
If you’re new to AI, read straight through. If you keep hitting a specific term you don’t know, jump to the section.
Models and architecture
1. Foundation model. A very large, general-purpose AI model trained on massive amounts of data, then adapted for specific uses. GPT, Claude, Gemini, Llama, and DeepSeek are all foundation models. See Foundation models explained.
2. LLM (large language model). A foundation model trained on text. The “L” can mean “language” or “large” depending on who’s writing.
3. Transformer. The neural network architecture that powers nearly every modern AI model since 2017. Reads input as a sequence of tokens, predicts what comes next.
4. Diffusion model. The architecture behind most image generators (Midjourney, FLUX, Stable Diffusion). Starts from random noise, gradually denoises it into a coherent image. See How AI image generators actually work.
5. MoE (Mixture of Experts). An efficiency trick where only some of the model’s parameters activate per request. Lets a 600B-parameter model run as fast as a 30B-parameter one. Used in DeepSeek V3 and many 2026 frontier models.
6. Multimodal. A model that handles more than text — usually text + images, often + audio + video. Gemini 3.1 Pro is multimodal across text, images, audio, and video.
7. Open weights. A model whose trained parameters are public — you can download and run it yourself. DeepSeek and Llama are open-weights. ChatGPT and Claude are not.
8. Open source (in AI). Loosely used. Strict definition: code, weights, and training data are all public. Llama is open-weights, not strictly open-source. Few major models are truly open-source.
9. Parameter count. The size of a model — billions or trillions of numbers that get adjusted during training. Larger isn’t always better, but it correlates with capability up to a point.
10. Distillation. Training a smaller model to mimic a larger one’s outputs. Most “fast” or “mini” model variants (GPT-5.5 Mini, Claude Haiku) are distilled.
Capability terms
11. Context window. The amount of text the model can read in one prompt, measured in tokens. In 2026, frontier models support 1M+ tokens — enough for an entire codebase or 900-page PDF.
12. Token. The unit a model reads and generates. Roughly 0.75 words in English. Pricing is usually per million tokens.
13. Reasoning model. A model that produces explicit intermediate “thinking” steps before answering. Claude Opus 4.7, GPT-5.5, and Gemini 3.1 Pro now blend reasoning into the default response. DeepSeek R1 and R2 are dedicated reasoning models.
14. Chain of thought (CoT). The pattern of reasoning step-by-step before answering. The output you see when a reasoning model “shows its work.”
15. Prompt. The input you give to a model. The art of writing prompts is called prompt engineering.
16. System prompt. A hidden prompt that tells the model how to behave (tone, persona, restrictions) before any user message. Custom GPTs use system prompts to specialize.
17. Few-shot prompting. Including 1-5 examples in your prompt to show the model the pattern you want. Often dramatically improves output.
18. Hallucination. When a model confidently states something false. Models are statistical pattern-matchers, not databases — they generate plausible-sounding text, which sometimes isn’t true.
19. Grounding. Connecting model outputs to verified sources. RAG and search-grounded models like Perplexity reduce hallucinations by grounding answers in retrieved documents.
20. RAG (retrieval-augmented generation). A technique that fetches relevant documents at query time and includes them in the model’s context. Used in Perplexity, NotebookLM, and most enterprise AI search.
Agents and copilots
21. Agent. An AI that takes a goal, plans steps, and executes them autonomously. Examples: ChatGPT Agent, Claude Code, Manus, Devin. See Chatbots vs agents vs copilots.
22. Copilot. AI integrated inside an existing tool, suggesting changes you accept or reject. GitHub Copilot, Notion AI, Microsoft Copilot.
23. Computer use. A model that can control a real screen — clicking buttons, typing, scrolling. Claude Sonnet 4.6 and GPT-5.5 both reach 72-75% on the OSWorld benchmark.
24. OSWorld. The standard benchmark for computer use. Tests AI on real-world tasks across apps like Google Drive, Excel, browsers. Human baseline ~75-80%.
25. SWE-bench. The standard benchmark for autonomous coding agents. Tests them on real GitHub issues. Claude 4.6 currently leads at 75.6%.
26. Tool use / function calling. When a model calls external functions or APIs to do things outside its own context — fetch a web page, query a database, send an email.
27. Cascading agents. Multiple agents working together — one plans, another executes, another verifies. Common in production agentic systems.
28. Autonomy level. How much the agent does between human checkpoints. A “low autonomy” copilot suggests changes; a “high autonomy” agent runs end-to-end without you.
Image and video models
29. Latent space. The compressed mathematical representation a model works in. For diffusion models, the actual generation happens in latent space, then gets decoded to pixels.
30. Embedding. A list of numbers representing text, images, or other content semantically. “Dog” and “puppy” have similar embeddings; “dog” and “spreadsheet” do not.
31. Style reference (—sref). A Midjourney feature: include a reference image and the model matches that visual style.
32. Inpainting. Regenerating a specific portion of an image (or audio) without changing the rest. Udio’s killer feature for music. Photoshop’s Generative Fill is also inpainting.
33. ControlNet. A Stable Diffusion technique that lets you steer image generation with additional inputs (sketches, depth maps, poses).
34. Upscaling. Increasing image resolution using AI. Often a separate model after generation.
35. Text-to-video. Generating a video clip from a text prompt. Veo 3.1, Kling 3.0, Runway Gen-4.5, Pika.
36. Image-to-video. Animating a still image with AI. The default starting point for most consumer video generation in 2026.
Pricing and infrastructure
37. Per-token pricing. API cost per million tokens of input or output. GPT-5.5 is around $2.50 input / $15 output per million.
38. Cached input. When you reuse the same context across requests, providers offer steep discounts. Gemini drops cached input to $0.20/million tokens — a 90% cut.
39. Inference. Running a trained model to generate output. The expensive part of operating AI at scale.
40. Fine-tuning. Continuing a model’s training on your own data to specialize it. Suno V5.5 lets users fine-tune music models on their own catalog.
41. LoRA. Low-Rank Adaptation. A cheap, fast way to fine-tune a large model by training only a small set of additional parameters. Common in Stable Diffusion ecosystems.
42. Quantization. Compressing a model to use less memory by reducing the precision of its numbers. Lets you run a 70B model on a laptop.
Search and integration
43. AI search. Search engines (Perplexity, Google AI Mode, ChatGPT search) that use a language model to synthesize answers from web sources, with citations.
44. AI Overviews. Google Search’s AI-generated answer block at the top of results. The thing that’s made traditional SEO harder in 2026.
45. llms.txt. An emerging convention: a markdown file at the root of your website that summarizes its contents for AI crawlers. Like robots.txt but for LLMs.
46. GEO (Generative Engine Optimization). SEO for AI search engines. Optimizing your content to be cited and surfaced in ChatGPT, Perplexity, Claude, Gemini answers.
Safety and policy
47. Jailbreak. A prompt that bypasses a model’s safety training. The classic “ignore previous instructions” pattern.
48. Alignment. Making sure a model’s behavior matches human values. The whole field of “AI safety” in academic and industrial research.
49. Constitutional AI. Anthropic’s approach to alignment — training models against a set of written principles rather than only via human feedback.
50. RLHF (reinforcement learning from human feedback). Training method where humans rank model outputs and the model learns to produce more of what humans prefer. Used in ChatGPT and most consumer AI assistants.
What this glossary leaves out
The terms above are the ones you’ll actually encounter using AI tools in 2026. There’s a separate vocabulary of research-only concepts (mixture of depths, speculative decoding, KV-cache compression) that doesn’t show up in product pages, and a separate vocabulary of marketing buzzwords (synergy, AI-powered, intelligent) that don’t mean anything specific. Skip both unless you’re working in the space directly.
For broader context, see What is generative AI? and The state of AI tools in 2026.