All Posts

69 posts

Search

Category

Tag

Series

Adding a Backend to PyTorch - Why It Matters and How It Works

⏱️ 3 min Pytorch #Python #AI Framworks

C++ in Machine Learning - Behind the Scenes of Performance

⏱️ 3 min Programming Languages #C++

Concurrency - How to Make a System Handle Multiple Tasks Simultaneously

⏱️ 2 min Inference Optimization #Concurrency

Core Management - How to Properly Manage Your Processing Power

⏱️ 3 min 📚 Hardware Inference Optimization - Part 5 Hardware #Core Management #Optimization

CUDA - The Tool That Made the GPU Accessible to Everyone

⏱️ 3 min 📚 AI Hardware & Infrastructure - Part 3 Software #CUDA #Parallel Programming

Data Center, AI Server, GPU Cluster - Three Concepts Everyone in AI Must Understand

⏱️ 3 min 📚 AI Hardware & Infrastructure - Part 6 Infrastructure #Data Center #AI Server

Data Centers - The Home of All Artificial Intelligence

⏱️ 2 min 📚 AI Hardware & Infrastructure - Part 1 Infrastructure #Data Centers #AI Infrastructure

Divided Resources - How to Allocate Resources Between Models or Processes

⏱️ 3 min 📚 Hardware Inference Optimization - Part 7 Machine Learning Hardware #Resource Division #Optimization #Inference

Dynamic Graph or Static Graph - How Does Your Model Think?

⏱️ 2 min Graphs #Dynamic Graph

Eager Execution - When Models Start Thinking in Real-Time

⏱️ 2 min Performance #Eager Execution #Dynamic Graph

FAB, Bring-Up, and Post-Silicon - How Does the Chip Come to Life?

⏱️ 5 min 📚 Chip Design Journey - Part 13 Chip Design #FAB #Post-Silicon

Found a Bottleneck? Here’s What to Do Next

⏱️ 2 min Profiling #Bottleneck #Inference Optimization

GPU Cluster - Teaching Hundreds of Cards to Work Like One Brain

⏱️ 3 min 📚 AI Hardware & Infrastructure - Part 5 Infrastructure #GPU Cluster #Data Centers

gRPC - How AI Systems “Talk” to Each Other

⏱️ 3 min Communication #gRPC #Protobuf

How Containers Improve Performance and Accuracy in Inference Benchmarking

⏱️ 2 min 📚 Docker for Benchmarking - Part 4 Infrastructure #Docker #Benchmarking

How Do You Actually 'Write' Hardware? The First Step to Understanding RTL and the Frontend World

⏱️ 3 min 📚 Chip Design Journey - Part 3 Chip Design #RTL #Frontend

How Do You Measure the Speed of an AI Model?

⏱️ 2 min Performance Metrics #Throughput #Latency

How Does Inference Actually Work?

⏱️ 2 min 📚 Inference Deep Dive - Part 2 Inference Process #Optimization #Inference Deep Dive

How to Build a Benchmarking Environment with Docker (Including GPU)

⏱️ 2 min 📚 Docker for Benchmarking - Part 3 Infrastructure #Docker #Benchmarking

How to Increase Throughput Without Slowing Down the System? (Batching, Stream Scheduling, and Offload)

⏱️ 2 min Inference Optimization #Performance #Throughput

How to Integrate Docker into CI/CD for Automated Inference Benchmarking

⏱️ 2 min 📚 Docker for Benchmarking - Part 5 Infrastructure #Docker #CI/CD

Inference Optimization - Making Models Work Faster, Not Just Better

⏱️ 3 min 📚 Inference Deep Dive - Part 5 Inference Optimization #Quantization #Batching

InternViT - The Next Step After ViT

⏱️ 3 min Computer Vision #InternViT #Models

MLOps - How a Great Model Reaches Production

⏱️ 2 min Production #MLOps #Monitoring

NVIDIA - How a Graphics Card Company Became the Queen of AI

⏱️ 3 min 📚 AI Hardware & Infrastructure - Part 2 Hardware #NVIDIA #GPU

ONNX - How Models Finally Speak the Same Language

⏱️ 2 min AI Frameworks #ONNX #Interoperability

Parallelism - How to Run Models in Parallel?

⏱️ 2 min Inference Optimization #Parallelism

Provisioning - Preparing the Ground Before Running Models

⏱️ 2 min Inference Process #Provisioning #Resources

Resource Optimization - How All Factors Impact Latency and TPS

⏱️ 2 min 📚 Hardware Inference Optimization - Part 8 Machine Learning Hardware #Resource Optimization #NUMA #Inference

RTL for Beginners - What is Verilog/VHDL?

⏱️ 4 min 📚 Chip Design Journey - Part 5 Chip Design #RTL #Verilog

Series Introduction: How Is a Chip Born? - A Complete Journey from Idea to Manufacturing

⏱️ 4 min 📚 Chip Design Journey - Part 0 Chip Design #Introduction #Overview

Series Summary: From NUMA to Throughput - How Optimization Turns Hardware into Performance

⏱️ 2 min 📚 Hardware Inference Optimization - Part 9 Machine Learning Hardware #Optimization #NUMA #Thread Affinity #Resource Management

Series Summary: The Complete Journey from Idea to Chip - All Stages at a Glance

⏱️ 5 min 📚 Chip Design Journey - Part 14 Chip Design #Summary #Overview

Serving - How a Model Starts “Talking to the World”

⏱️ 2 min Inference Process #Serving #Batching

Simulation, FPGA, Emulation - How Do You Test a Chip Before Manufacturing?

⏱️ 5 min 📚 Chip Design Journey - Part 11 Chip Design #Simulation #FPGA

The Challenges of Scaling - Why “More” Can Sometimes Be Less

⏱️ 3 min אופטימיזציית חומרה #Scaling

Thread Affinity - How to Bind Cores Smartly

⏱️ 3 min 📚 Hardware Inference Optimization - Part 6 Machine Learning Hardware #Thread Affinity #Optimization #Inference

TTM - Why Time To Market is a Critical Part of Inference Engineering and AI Solutions

⏱️ 3 min אסטרטגיה עסקית #TTM

vLLM - How to Make Models Respond Faster Without Wasting Memory

⏱️ 2 min vLLM #Inference Engine #Serving

What are Cores and Threads?

⏱️ 1 min 📚 Hardware Inference Optimization - Part 3 Hardware #Cores #Threads

What are Docker, Images, and Containers?

⏱️ 3 min 📚 Docker for Benchmarking - Part 2 Infrastructure #Docker #Containers

What Happens Behind the Scenes When the Model Answers You? (Prefill, Decoding, and KV Cache)

⏱️ 2 min 📚 Inference Deep Dive - Part 3 Inference #Prefill #Decode #KV Cache

What is a Chip? The Simplest Explanation to Start Your Hardware Journey

⏱️ 2 min 📚 Chip Design Journey - Part 1 Chip Design #Chips #Hardware

What is a Kernel?

⏱️ 2 min Hardware Acceleration #Kernels #Parallel Computing

What is a Model Pipeline?

⏱️ 2 min Machine Learning #Model Pipeline #MLOps

What is a Sandbox and Why is it Essential for AI?

⏱️ 3 min Development Tools #Testing Environments

What is a System on Chip (SoC) - And Why Can a Single Chip Contain an Entire World?

⏱️ 3 min 📚 Chip Design Journey - Part 2 Chip Design #SoC #Architecture

What is an Accelerator?

⏱️ 2 min 📚 AI Hardware & Infrastructure - Part 4 Hardware #Accelerator #GPU

What is an Ecosystem in Technology and AI?

⏱️ 2 min 📚 AI Hardware & Infrastructure - Part 7 Infrastructure #Ecosystem #Frameworks

What is an Inference Engine - and Why is it So Important?

⏱️ 2 min 📚 Inference Deep Dive - Part 6 Inference Optimization #Engines #vLLM

What is Cache and Why Does It Change Everything?

⏱️ 1 min 📚 Hardware Inference Optimization - Part 4 Hardware #Cache #Optimization

What is Chip Architecture - And Why Is It the Stage Where You Decide What the Chip Will Really Be?

⏱️ 4 min 📚 Chip Design Journey - Part 6 Chip Design #Architecture #System Design

What is Docker and Why Does Everyone Use It?

⏱️ 2 min 📚 Docker for Benchmarking - Part 1 Infrastructure #Docker #Containers

What is Frontend in the World of Chips?

⏱️ 3 min 📚 Chip Design Journey - Part 4 Chip Design #Frontend #Logic Design

What is Inference and Why Does it Happen After Training?

⏱️ 2 min 📚 Inference Deep Dive - Part 1 Inference #Training #Inference Deep Dive

What is Inference Benchmarking - and Why is it So Important?

⏱️ 2 min Inference Benchmarking #Throughput #Latency #Performance

What is Kernel Fusion - And How It Speeds Up Your Model Without Changing It

⏱️ 2 min Inference Optimization #Kernel

What is NUMA and Why is it Important for Inference Optimization?

⏱️ 2 min 📚 Hardware Inference Optimization - Part 2 Hardware #NUMA #Inference

What is Place & Route - And How Do You Position Gates on a Chip and Connect Them?

⏱️ 4 min 📚 Chip Design Journey - Part 9 Chip Design #Place & Route #Backend

What is STA - Static Timing Analysis - And How Do You Ensure the Chip Will Work at the Right Frequency?

⏱️ 5 min 📚 Chip Design Journey - Part 10 Chip Design #STA #Timing

What is Synthesis - And How Does RTL Become Actual Gates in a Chip?

⏱️ 4 min 📚 Chip Design Journey - Part 8 Chip Design #Synthesis #Backend

What is Tapeout - And Do You Really Send a Tape to Manufacturing?

⏱️ 4 min 📚 Chip Design Journey - Part 12 Chip Design #Tapeout #Production

What is Verification - And Why Is 70% of Chip Development Testing?

⏱️ 4 min 📚 Chip Design Journey - Part 7 Chip Design #Verification #Testing

What is ViT - and Why is it a Paradigm Shift in Computer Vision?

⏱️ 3 min Computer Vision #ViT #Transformers

Why Do We Need to Understand Hardware for Inference Optimization?

⏱️ 2 min 📚 Hardware Inference Optimization - Part 1 Hardware #Optimization #Inference

Why Does Your Model “Feel Slow”?

⏱️ 2 min Profilig #Bottleneck #Performance Benchmark

Why is Everyone Talking About Python When It Comes to Machine Learning?

⏱️ 2 min Programming Languages #Python

Why Isn’t Your Model Enough? - Scaling in AI

⏱️ 2 min Inference Optimization #Scaling

Why Isn't Your Model Running as Fast as Expected? Bottlenecks in Inference

⏱️ 3 min 📚 Inference Deep Dive - Part 4 Inference Challenges #Bottlenecks #Optimization