How Do You Measure the Speed of an AI Model?

👤 Efrat Bdil 📅 1/7/2026 ⏱️ 2 min read

Performance Metrics #Throughput #Latency

Table of Contents

How Do You Measure the Speed of an AI Model?

When we talk about a fast model, what do we actually mean? Two main metrics determine the user experience during inference - the phase where the model is already trained and returns answers:

TTFT - Time To First Token

The time it takes for the model to start speaking. Just like in a human conversation - how long it takes from the moment you ask a question until the other side utters the first word. TTFT directly affects the perception of “how responsive the system feels.”

TPS - Tokens Per Second

How quickly the model continues to speak after it starts. This is the rate at which it “writes” or generates new tokens (small text units). TPS affects how quickly you get a complete answer.

Balancing the Two

If we shorten TTFT too much - TPS might suffer (the model starts quickly but stalls midway). If we prioritize high TPS - the start time might increase. Performance engineers focus precisely on this balance - how to make the model start quickly and continue smoothly.

Bottom Line:

TTFT = Perceived responsiveness.

TPS = Continuous work rate.

Proper balance between them = Perfect user experience.

How Do You Measure the Speed of an AI Model?

How Do You Measure the Speed of an AI Model?

TTFT - Time To First Token

TPS - Tokens Per Second

Balancing the Two

Bottom Line:

🔗 Related Posts

Comments

How Do You Measure the Speed of an AI Model?

TTFT - Time To First Token

TPS - Tokens Per Second

Balancing the Two

Bottom Line:

🔗 Related Posts

What is Inference Benchmarking - and Why is it So Important?

How to Increase Throughput Without Slowing Down the System? (Batching, Stream Scheduling, and Offload)

Comments