About

AI Infrastructure Engineer | Inference Optimization Specialist

Hi, I'm Efrat Bdil 👋

I work in AI infrastructure, with a deep focus on the Inference stage: the moment when a trained model needs to work in production, handle real workloads, and respond with predictable and stable latency.

My work sits at the intersection of software, models, and infrastructure. I develop and optimize AI inference pipelines on top of the NR1 platform, our company's dedicated chip.

This includes working with PyTorch-based models — in both Vision (like YOLO) and NLP/LLMs (like BERT) — and integrating them into full pipelines, from input to output.

What I Do Day-to-Day

🔍 Performance Analysis

A key part of my work is understanding why a system is slow — not just where. I do profiling, which is debugging and analyzing performance: Latency, Throughput, and Bottlenecks. I compare runs on different hardware and try to understand what changes when moving from general-purpose to dedicated hardware.

🛠️ Infrastructure Development

Day-to-day, I work a lot with Python and Linux, write Bash scripts, build Docker images and run containers, and maintain complex Benchmarking environments.

⚙️ Automation & CI/CD

There's also quite a bit of automation: internal tools, CI/CD processes with Jenkins, working with Artifactory, and infrastructure that ensures every change is measured, tested, and analyzed.

🤝 Team Collaboration

The work is not isolated. I regularly collaborate with hardware, compiler, and backend teams to turn models into code that actually runs in production. This includes understanding hardware-software integrated systems and diagnosing issues that can come from drivers, configuration, or communication interfaces.

Technologies & Skills

🧠 AI & ML

PyTorch ONNX Triton Inference Server LLMs Computer Vision

💻 Programming & Tools

Python Bash Linux Docker Git

🚀 DevOps & CI/CD

Jenkins Artifactory CI/CD Pipelines Automation

📊 Performance & Optimization

Profiling Benchmarking GenAI-Perf Hardware Acceleration

About This Blog

This blog was created to document what I learn and share knowledge about:

  • AI Inference — How to run models in production efficiently
  • Performance Optimization — Techniques to improve speed and response time
  • Infrastructure & DevOps — Docker, CI/CD, automation
  • Hardware Acceleration — GPUs, dedicated accelerators, and NR1
  • Profiling & Debugging — Tools and methods for performance analysis

Much of the learning happens through doing: understanding how small decisions affect performance, and how a complete system behaves when it meets real workloads.

Get in Touch

Feel free to read, share, and comment on the posts. Comments and community help learn more!