What is Inference and Why Does it Happen After Training?
What is Inference and Why Does it Happen After Training?
To understand the world of inference, let’s start with a simple analogy: Imagine a student who spends months studying for an exam. They read books, practice questions, and correct mistakes - this is the training phase. But on the day of the exam - they’re no longer learning. They’re simply applying what they’ve already learned. This is exactly what happens in an AI model.
Phase 1 - Training
In this phase, the model learns from vast amounts of data. It adjusts its internal “weights” (mathematical values representing knowledge) to map inputs to correct outputs. This is a slow, resource-intensive process that requires specialized hardware - GPUs, TPUs, or expensive accelerators.
Phase 2 - Inference
Now that the model is trained, the inference phase begins. Instead of learning, it simply “computes” the new answer based on the knowledge it already has. It’s like the student in the exam who doesn’t read a book - they just answer based on what they remember.
In inference, only one computation happens - the forward pass - a single pass through the neural network. There’s no weight updating, no error correction, just producing an answer.
Why is This Important?
Every question you ask a model like ChatGPT, or every image a model generates, is essentially one inference. That’s why the efficiency of this process determines:
- How quickly you get an answer (latency),
- How many answers can be processed simultaneously (throughput),
- And how much electricity/hardware it costs.
Bottom Line
Training = Learning.
Inference = Applying what’s been learned.
The speed of inference is what makes AI a practical tool and not just a lab experiment.
In the next post, we’ll learn about: How the Inference process actually works, what happens behind the scenes when the model receives input and returns an answer.
📚 More in this Series: Inference Deep Dive
- Part 2 How Does Inference Actually Work?
- Part 3 What Happens Behind the Scenes When the Model Answers You? (Prefill, Decoding, and KV Cache)
- Part 4 Why Isn't Your Model Running as Fast as Expected? Bottlenecks in Inference
- Part 5 Inference Optimization - Making Models Work Faster, Not Just Better
- Part 6 What is an Inference Engine - and Why is it So Important?