What is Cache and Why Does It Change Everything?

👤 Efrat Bdil 📅 1/7/2026 ⏱️ 1 min read

📚 Hardware Inference Optimization - Part 4 Hardware #Cache #Optimization

Table of Contents

What is Cache and Why Does It Change Everything?

Alongside the cores, there’s a critical part of the processor called Cache - and it’s the secret to speed.

Why Do We Even Need It?

Accessing main memory (RAM) is much slower than the computations the processor performs. That’s why every processor includes a small, extremely fast memory - the cache - where it stores data it uses repeatedly.

In fact, when a model performs inference, it doesn’t access main memory every time. The data and variables it needs most are stored in the cache, saving valuable time.

Types of Cache

There are several “layers” of cache:

L1 - The smallest and fastest, located directly on the core.
L2 - Larger but slightly slower.
L3 - Shared across all cores, used for sharing information between them.

Why is This Important for Inference?

If threads migrate between cores, they lose their cache - and this causes performance to fluctuate up and down.

This is one of the reasons why understanding Thread Affinity (binding tasks to a fixed core) is crucial - a topic we’ll dive into in the next post.

What is Cache and Why Does It Change Everything?

What is Cache and Why Does It Change Everything?

Why Do We Even Need It?

Types of Cache

Why is This Important for Inference?

📚 More in this Series: Hardware Inference Optimization

🔗 Related Posts

Comments

What is Cache and Why Does It Change Everything?

Why Do We Even Need It?

Types of Cache

Why is This Important for Inference?

📚 More in this Series: Hardware Inference Optimization

🔗 Related Posts

NVIDIA - How a Graphics Card Company Became the Queen of AI

What is an Accelerator?

How Does Inference Actually Work?

Why Isn't Your Model Running as Fast as Expected? Bottlenecks in Inference

Comments