I’m Necati Demir, a PhD Computer Scientist with 20 years of industry experience, who transforms experimental ML projects into production-ready systems through expertly crafted data pipelines and infrastructure. With experience in end-to-end MLOps implementation, I bridge the gap between data science innovation and real-world production deployment.
Now as an independent consultant based in the US, I help organizations reduce model deployment time from months to days, and achieve significant performance gains through systematic approaches to data engineering. My unique combination of not only developing end-to-end ML pipelines but also building ML models helps me see the process end to end.
Understanding SIMD Performance: A Developer's Introduction with Real Benchmarks
Table of Contents Introduction SIMD Implementation Fundamentals 2.1 AVX2 and 256-bit Registers 2.2 Memory Alignment Considerations 2.2.1 Unaligned Memory Access (Inefficient) 2.2.2 Aligned Memory Access (Efficient) 2.3 Loop Unrolling Technique 2.4 Compilation Requirements SIMD in Practice: Dot Product Case Study 3.1 Four Implementation Approaches 3.1.1 Scalar Implementation (Baseline) 3.1.2 Basic SIMD Implementation 3.1.3 Unrolled SIMD Implementation 3.1.4 Aligned SIMD Implementation Performance Analysis: Benchmark Results 4.1 Compilation Methodology 4.2 Results with -O3 Optimization 4.3 Results without -O3 Optimization 4.4 Compiler Optimization vs Manual SIMD: Key Insights 4.4.1 Compiler Auto-Vectorization is Remarkably Effective 4.4.2 Manual Optimization Value Depends on Context 4.4.3 Memory Hierarchy Effects Persist Regardless Conclusion & Practical Takeaways 5.1 Key Practical Takeaways 1. Introduction Do you keep hearing SIMD but don’t know what it is all about? Here is an article for you. SIMD is the go-to technique for squeezing every ounce of performance from modern CPUs. The promise of SIMD is: process 8 floating-point numbers simultaneously instead of one and in theory that will achieve 8x speed. That is just theory because there are other parameters that impact the results.
...
Building an End-to-End Chat Bot with ONNX Runtime and Rust
Table of Contents Introduction Prerequisites Project Setup Architecture Overview Exporting Models to ONNX Loading an ONNX Model Text Generation Pipeline Building the CLI Chat Interface Going Further Conversation Memory Temperature & Top-p Sampling Streaming Tokens Performance Optimizations Testing Deployment Considerations Conclusion TLDR
...
Envelope Encryption: The Security Pattern Every Cloud Developer Should Know
When building cloud applications that handle sensitive data, encryption isn’t optional, it’s essential. But there’s an important difference between just doing the basic encryption vs. implementing it correctly at scale. In this article, we’ll explore envelope encryption, a pattern that AWS, Google Cloud, and Azure all use internally and recommend for production applications.
Table of Contents What Is Envelope Encryption? The Problem with Direct KMS Encryption The Envelope Encryption Solution Why This Is Superior Production Implementation with Google Cloud Prerequisites: Set Up Infrastructure Production-Ready Implementation Best Practices Common Pitfalls to Avoid Conclusion What Is Envelope Encryption? Envelope encryption is a cryptographic pattern where you use two layers of keys:
...