Blog
Home
All Posts
Categories
Tags
Series
About
Contact
Personal Area
עברית
All Posts
69 posts
Search
Category
All Categories
AI Frameworks
Chip Design
Communication
Computer Vision
Development Tools
Graphs
Hardware
Hardware Acceleration
Inference
Inference Benchmarking
Inference Challenges
Inference Optimization
Inference Process
Infrastructure
Machine Learning
Performance
Performance Metrics
Production
Profilig
Profiling
Programming Languages
Pytorch
Software
vLLM
אופטימיזציית חומרה
אסטרטגיה עסקית
Tag
All Tags
AI Framworks
AI Infrastructure
AI Server
Accelerator
Architecture
Backend
Batching
Benchmarking
Bottleneck
Bottlenecks
C++
CI/CD
CUDA
Cache
Chips
Concurrency
Containers
Core Management
Cores
Data Center
Data Centers
Decode
Docker
Dynamic Graph
Eager Execution
Ecosystem
Engines
FAB
FPGA
Frameworks
Frontend
GPU
GPU Cluster
Hardware
Inference
Inference Deep Dive
Inference Engine
Inference Optimization
InternViT
Interoperability
Introduction
KV Cache
Kernel
Kernels
Latency
Logic Design
MLOps
Model Pipeline
Models
Monitoring
NUMA
NVIDIA
ONNX
Optimization
Overview
Parallel Computing
Parallel Programming
Parallelism
Performance
Performance Benchmark
Place & Route
Post-Silicon
Prefill
Production
Protobuf
Provisioning
Python
Quantization
RTL
Resource Division
Resource Management
Resource Optimization
Resources
STA
Scaling
Serving
Simulation
SoC
Summary
Synthesis
System Design
TTM
Tapeout
Testing
Testing Environments
Thread Affinity
Threads
Throughput
Timing
Training
Transformers
Verification
Verilog
ViT
gRPC
vLLM
Series
All Series
AI Hardware & Infrastructure
Chip Design Journey
Docker for Benchmarking
Hardware Inference Optimization
Inference Deep Dive
Clear Filters
❤️
🔖
Adding a Backend to PyTorch - Why It Matters and How It Works
⏱️ 3 min
Pytorch
#Python
#AI Framworks
❤️
🔖
C++ in Machine Learning - Behind the Scenes of Performance
⏱️ 3 min
Programming Languages
#C++
❤️
🔖
Concurrency - How to Make a System Handle Multiple Tasks Simultaneously
⏱️ 2 min
Inference Optimization
#Concurrency
❤️
🔖
Core Management - How to Properly Manage Your Processing Power
⏱️ 3 min
📚 Hardware Inference Optimization - Part 5
Hardware
#Core Management
#Optimization
❤️
🔖
CUDA - The Tool That Made the GPU Accessible to Everyone
⏱️ 3 min
📚 AI Hardware & Infrastructure - Part 3
Software
#CUDA
#Parallel Programming
❤️
🔖
Data Center, AI Server, GPU Cluster - Three Concepts Everyone in AI Must Understand
⏱️ 3 min
📚 AI Hardware & Infrastructure - Part 6
Infrastructure
#Data Center
#AI Server
❤️
🔖
Data Centers - The Home of All Artificial Intelligence
⏱️ 2 min
📚 AI Hardware & Infrastructure - Part 1
Infrastructure
#Data Centers
#AI Infrastructure
❤️
🔖
Divided Resources - How to Allocate Resources Between Models or Processes
⏱️ 3 min
📚 Hardware Inference Optimization - Part 7
Machine Learning
Hardware
#Resource Division
#Optimization
#Inference
❤️
🔖
Dynamic Graph or Static Graph - How Does Your Model Think?
⏱️ 2 min
Graphs
#Dynamic Graph
❤️
🔖
Eager Execution - When Models Start Thinking in Real-Time
⏱️ 2 min
Performance
#Eager Execution
#Dynamic Graph
❤️
🔖
FAB, Bring-Up, and Post-Silicon - How Does the Chip Come to Life?
⏱️ 5 min
📚 Chip Design Journey - Part 13
Chip Design
#FAB
#Post-Silicon
❤️
🔖
Found a Bottleneck? Here’s What to Do Next
⏱️ 2 min
Profiling
#Bottleneck
#Inference Optimization
❤️
🔖
GPU Cluster - Teaching Hundreds of Cards to Work Like One Brain
⏱️ 3 min
📚 AI Hardware & Infrastructure - Part 5
Infrastructure
#GPU Cluster
#Data Centers
❤️
🔖
gRPC - How AI Systems “Talk” to Each Other
⏱️ 3 min
Communication
#gRPC
#Protobuf
❤️
🔖
How Containers Improve Performance and Accuracy in Inference Benchmarking
⏱️ 2 min
📚 Docker for Benchmarking - Part 4
Infrastructure
#Docker
#Benchmarking
❤️
🔖
How Do You Actually 'Write' Hardware? The First Step to Understanding RTL and the Frontend World
⏱️ 3 min
📚 Chip Design Journey - Part 3
Chip Design
#RTL
#Frontend
❤️
🔖
How Do You Measure the Speed of an AI Model?
⏱️ 2 min
Performance Metrics
#Throughput
#Latency
❤️
🔖
How Does Inference Actually Work?
⏱️ 2 min
📚 Inference Deep Dive - Part 2
Inference Process
#Optimization
#Inference Deep Dive
❤️
🔖
How to Build a Benchmarking Environment with Docker (Including GPU)
⏱️ 2 min
📚 Docker for Benchmarking - Part 3
Infrastructure
#Docker
#Benchmarking
❤️
🔖
How to Increase Throughput Without Slowing Down the System? (Batching, Stream Scheduling, and Offload)
⏱️ 2 min
Inference Optimization
#Performance
#Throughput
❤️
🔖
How to Integrate Docker into CI/CD for Automated Inference Benchmarking
⏱️ 2 min
📚 Docker for Benchmarking - Part 5
Infrastructure
#Docker
#CI/CD
❤️
🔖
Inference Optimization - Making Models Work Faster, Not Just Better
⏱️ 3 min
📚 Inference Deep Dive - Part 5
Inference Optimization
#Quantization
#Batching
❤️
🔖
InternViT - The Next Step After ViT
⏱️ 3 min
Computer Vision
#InternViT
#Models
❤️
🔖
MLOps - How a Great Model Reaches Production
⏱️ 2 min
Production
#MLOps
#Monitoring
❤️
🔖
NVIDIA - How a Graphics Card Company Became the Queen of AI
⏱️ 3 min
📚 AI Hardware & Infrastructure - Part 2
Hardware
#NVIDIA
#GPU
❤️
🔖
ONNX - How Models Finally Speak the Same Language
⏱️ 2 min
AI Frameworks
#ONNX
#Interoperability
❤️
🔖
Parallelism - How to Run Models in Parallel?
⏱️ 2 min
Inference Optimization
#Parallelism
❤️
🔖
Provisioning - Preparing the Ground Before Running Models
⏱️ 2 min
Inference Process
#Provisioning
#Resources
❤️
🔖
Resource Optimization - How All Factors Impact Latency and TPS
⏱️ 2 min
📚 Hardware Inference Optimization - Part 8
Machine Learning
Hardware
#Resource Optimization
#NUMA
#Inference
❤️
🔖
RTL for Beginners - What is Verilog/VHDL?
⏱️ 4 min
📚 Chip Design Journey - Part 5
Chip Design
#RTL
#Verilog
❤️
🔖
Series Introduction: How Is a Chip Born? - A Complete Journey from Idea to Manufacturing
⏱️ 4 min
📚 Chip Design Journey - Part 0
Chip Design
#Introduction
#Overview
❤️
🔖
Series Summary: From NUMA to Throughput - How Optimization Turns Hardware into Performance
⏱️ 2 min
📚 Hardware Inference Optimization - Part 9
Machine Learning
Hardware
#Optimization
#NUMA
#Thread Affinity
#Resource Management
❤️
🔖
Series Summary: The Complete Journey from Idea to Chip - All Stages at a Glance
⏱️ 5 min
📚 Chip Design Journey - Part 14
Chip Design
#Summary
#Overview
❤️
🔖
Serving - How a Model Starts “Talking to the World”
⏱️ 2 min
Inference Process
#Serving
#Batching
❤️
🔖
Simulation, FPGA, Emulation - How Do You Test a Chip Before Manufacturing?
⏱️ 5 min
📚 Chip Design Journey - Part 11
Chip Design
#Simulation
#FPGA
❤️
🔖
The Challenges of Scaling - Why “More” Can Sometimes Be Less
⏱️ 3 min
אופטימיזציית חומרה
#Scaling
❤️
🔖
Thread Affinity - How to Bind Cores Smartly
⏱️ 3 min
📚 Hardware Inference Optimization - Part 6
Machine Learning
Hardware
#Thread Affinity
#Optimization
#Inference
❤️
🔖
TTM - Why Time To Market is a Critical Part of Inference Engineering and AI Solutions
⏱️ 3 min
אסטרטגיה עסקית
#TTM
❤️
🔖
vLLM - How to Make Models Respond Faster Without Wasting Memory
⏱️ 2 min
vLLM
#Inference Engine
#Serving
❤️
🔖
What are Cores and Threads?
⏱️ 1 min
📚 Hardware Inference Optimization - Part 3
Hardware
#Cores
#Threads
❤️
🔖
What are Docker, Images, and Containers?
⏱️ 3 min
📚 Docker for Benchmarking - Part 2
Infrastructure
#Docker
#Containers
❤️
🔖
What Happens Behind the Scenes When the Model Answers You? (Prefill, Decoding, and KV Cache)
⏱️ 2 min
📚 Inference Deep Dive - Part 3
Inference
#Prefill
#Decode
#KV Cache
❤️
🔖
What is a Chip? The Simplest Explanation to Start Your Hardware Journey
⏱️ 2 min
📚 Chip Design Journey - Part 1
Chip Design
#Chips
#Hardware
❤️
🔖
What is a Kernel?
⏱️ 2 min
Hardware Acceleration
#Kernels
#Parallel Computing
❤️
🔖
What is a Model Pipeline?
⏱️ 2 min
Machine Learning
#Model Pipeline
#MLOps
❤️
🔖
What is a Sandbox and Why is it Essential for AI?
⏱️ 3 min
Development Tools
#Testing Environments
❤️
🔖
What is a System on Chip (SoC) - And Why Can a Single Chip Contain an Entire World?
⏱️ 3 min
📚 Chip Design Journey - Part 2
Chip Design
#SoC
#Architecture
❤️
🔖
What is an Accelerator?
⏱️ 2 min
📚 AI Hardware & Infrastructure - Part 4
Hardware
#Accelerator
#GPU
❤️
🔖
What is an Ecosystem in Technology and AI?
⏱️ 2 min
📚 AI Hardware & Infrastructure - Part 7
Infrastructure
#Ecosystem
#Frameworks
❤️
🔖
What is an Inference Engine - and Why is it So Important?
⏱️ 2 min
📚 Inference Deep Dive - Part 6
Inference Optimization
#Engines
#vLLM
❤️
🔖
What is Cache and Why Does It Change Everything?
⏱️ 1 min
📚 Hardware Inference Optimization - Part 4
Hardware
#Cache
#Optimization
❤️
🔖
What is Chip Architecture - And Why Is It the Stage Where You Decide What the Chip Will Really Be?
⏱️ 4 min
📚 Chip Design Journey - Part 6
Chip Design
#Architecture
#System Design
❤️
🔖
What is Docker and Why Does Everyone Use It?
⏱️ 2 min
📚 Docker for Benchmarking - Part 1
Infrastructure
#Docker
#Containers
❤️
🔖
What is Frontend in the World of Chips?
⏱️ 3 min
📚 Chip Design Journey - Part 4
Chip Design
#Frontend
#Logic Design
❤️
🔖
What is Inference and Why Does it Happen After Training?
⏱️ 2 min
📚 Inference Deep Dive - Part 1
Inference
#Training
#Inference Deep Dive
❤️
🔖
What is Inference Benchmarking - and Why is it So Important?
⏱️ 2 min
Inference Benchmarking
#Throughput
#Latency
#Performance
❤️
🔖
What is Kernel Fusion - And How It Speeds Up Your Model Without Changing It
⏱️ 2 min
Inference Optimization
#Kernel
❤️
🔖
What is NUMA and Why is it Important for Inference Optimization?
⏱️ 2 min
📚 Hardware Inference Optimization - Part 2
Hardware
#NUMA
#Inference
❤️
🔖
What is Place & Route - And How Do You Position Gates on a Chip and Connect Them?
⏱️ 4 min
📚 Chip Design Journey - Part 9
Chip Design
#Place & Route
#Backend
❤️
🔖
What is STA - Static Timing Analysis - And How Do You Ensure the Chip Will Work at the Right Frequency?
⏱️ 5 min
📚 Chip Design Journey - Part 10
Chip Design
#STA
#Timing
❤️
🔖
What is Synthesis - And How Does RTL Become Actual Gates in a Chip?
⏱️ 4 min
📚 Chip Design Journey - Part 8
Chip Design
#Synthesis
#Backend
❤️
🔖
What is Tapeout - And Do You Really Send a Tape to Manufacturing?
⏱️ 4 min
📚 Chip Design Journey - Part 12
Chip Design
#Tapeout
#Production
❤️
🔖
What is Verification - And Why Is 70% of Chip Development Testing?
⏱️ 4 min
📚 Chip Design Journey - Part 7
Chip Design
#Verification
#Testing
❤️
🔖
What is ViT - and Why is it a Paradigm Shift in Computer Vision?
⏱️ 3 min
Computer Vision
#ViT
#Transformers
❤️
🔖
Why Do We Need to Understand Hardware for Inference Optimization?
⏱️ 2 min
📚 Hardware Inference Optimization - Part 1
Hardware
#Optimization
#Inference
❤️
🔖
Why Does Your Model “Feel Slow”?
⏱️ 2 min
Profilig
#Bottleneck
#Performance Benchmark
❤️
🔖
Why is Everyone Talking About Python When It Comes to Machine Learning?
⏱️ 2 min
Programming Languages
#Python
❤️
🔖
Why Isn’t Your Model Enough? - Scaling in AI
⏱️ 2 min
Inference Optimization
#Scaling
❤️
🔖
Why Isn't Your Model Running as Fast as Expected? Bottlenecks in Inference
⏱️ 3 min
📚 Inference Deep Dive - Part 4
Inference Challenges
#Bottlenecks
#Optimization
No posts found matching your search