I'm Farrukh, an ML engineer who enjoys building production grade ML systems and squeezing models into places they probably shouldn't fit.
I work primarily with PyTorch, TensorFlow, and Hugging Face Transformers, focusing on model optimization, deployment, and efficient AI systems. My background is in mechanical engineering, but I’ve spent the past year designing and deploying ML pipelines—turns out optimizing fluid flow equations isn’t that different from optimizing neural networks.
Right now, I’m focusing on projects that make ML systems leaner and easier to ship:
Production Inference API – Built a DistilBERT service on AWS EC2 using FastAPI. Quantized the model to reduce size and latency by half, added CI/CD with GitHub Actions, and optimized request handling for stability under load. The goal is to understand what it takes to keep ML systems reliable in production.
Model Efficiency Research – Experimenting with model compression and quantization pipelines for edge and low-latency deployments. I’m especially interested in the trade-offs between model size, speed, and interpretability.
I was part of the UraanAI Techathon 2025, where our team built an integrated AI framework for manufacturing—computer vision for defect detection (99.6% accuracy), BiLSTM-GRU for predictive maintenance, and LightGBM for demand forecasting. The focus was deployment under real industrial constraints—limited compute, bandwidth, and cost.
I’ve also worked on model compression, taking a ResNet-based model from 45M parameters down to 180K (99.6% smaller) through knowledge distillation while keeping 94% accuracy. That 4× speedup made real-time inference viable on resource-limited hardware.
Project | Description | Tech Stack | Highlights |
---|---|---|---|
PakIndustry-4.0 | Integrated AI system for manufacturing — computer vision, predictive maintenance, and demand forecasting | PyTorch • LightGBM • FastAPI | 99.6% defect detection • Predictive RUL (MAE = 13.4) • Edge deployment |
Sentiment-MLOps | Production-ready DistilBERT inference API on AWS | Hugging Face • FastAPI • Docker • AWS • GitHub Actions | Quantized model (−50% size/latency) • CI/CD pipeline |
Model-Compression | Knowledge distillation and quantization pipeline for compact deep learning models | PyTorch • ONNX • NumPy | 99.6% parameter reduction (45M → 180K) • 4× faster inference |
📧 smfarrukhm@gmail.com •
💼 LinkedIn
💡 Open to ML engineering opportunities