Zaraar Malik
I build AI systems
that actually ship.
AI Engineer specializing in production-grade Retrieval-Augmented Generation (RAG), distributed multi-agent systems, and real-time computer vision deployment. I build resilient infrastructure that bridges the gap between complex research and scalable, low-latency products.
About Me

Bridging Rigorous Research with Production Infrastructure
I view artificial intelligence not as a collection of standalone black-box API calls, but as an optimization challenge. With a formal academic grounding in AI systems engineering, my focus lies in constructing predictable, robust architectures capable of serving production environments cleanly.
Whether integrating real-time computer vision capabilities into embedded edge systems, refining chunking logic for deterministic vector retrieval, or managing data validation layers, I engineer for reliability, strict cost constraints, and minimized latency.
Current Domain Focus Areas:
Advanced RAG Systems
Designing context-aware retrieval engines optimized for high-density technical knowledge bases.
SLM Optimization
Adapting Small Language Models to execute complex reasoning tasks smoothly on constrained hardware.
Agentic Architecture
Implementing multi-agent frameworks designed for precise tracking and autonomous decision-making.
Work Experience
AI Engineer
Apexbeat — Manchester, UK (Remote)
- ●Architected and scaled a production Multimodal Retrieval-Augmented Generation (MMM-RAG) medical reasoning platform on AWS, serving 2,000+ active users.
- ●Designed distributed ETL pipelines for automated multi-format document ingestion, clean text sanitization, and optimized vector database indexing.
- ●Engineered stateful session management layers using MongoDB to securely cache interactive historical user conversational context.
AI Intern
Systems Limited — Islamabad, Pakistan
- ●Executed parameter-efficient domain adaptation (PEFT) and fine-tuning on open-source LLMs, substantially improving domain-specific response relevance.
- ●Built production Azure CI/CD pipelines and containerized modular AI services utilizing Docker to guarantee reproducible cluster deployments.
- ●Conducted rigorous empirical feasibility studies on generative models, benchmarking latency, memory footprint, and dollar-cost trade-offs.
AI Research Assistant
FSM — Islamabad, Pakistan
- ●Collaborated closely with Data Engineering to orchestrate robust financial feature stores, running predictive analytics for churn and portfolio risk modeling.
- ●Developed an intelligent, RAG-driven investment assistant tailored to pull regional market indicators and localized regulatory insights.
- ●Authored modular, highly reproducible evaluation pipelines to dramatically accelerate internal model iteration and testing velocity.
AI Intern
AIO (Silicon Valley) — Islamabad, Pakistan
- ●Refined structural data-cleaning pipelines alongside core data infrastructure teams to optimize high-quality training tokens for vision models.
- ●Designed and coded proprietary alignment evaluation metrics to accurately audit model out-of-distribution drift and degradation patterns.
- ●Achieved a 20% improvement in model throughput via post-training quantization, hyperparameter optimization, and weights pruning.
Skills & Core Tools
Core Runtime & Interfaces
AI Architecture & Deep Learning
Distributed Data & Vector Storage


Cloud Engineering & Orchestration
Technical Projects
Multi-Agent Chatbot for Medical Reimbursement
Architected an autonomous multi-agent validation ecosystem designed to parse and cross-check complex clinical receipts, minimizing human audit overhead through graceful runtime error-handling pipelines.
TinyLLM RAG Chatbot for Task-Specific Queries
Evaluated performance constraints of parameter-efficient Small Language Models (SLMs) in RAG tasks, optimizing dynamic context chunking profiles to reduce consumer hardware compute costs.
Multi-Document RAG System
Engineered a scalable multi-source retrieval engine capable of querying massive unstructured text corpora using optimized document indexing topologies and deterministic retrieval layers.
Snap Shop – GenAI Fashion Synthesizer
Developed a deep learning virtual try-on application featuring custom LoRA fine-tuned Stable Diffusion models to handle high-fidelity text-to-garment image synthesis pipelines.
Procedural Game Level Generation
Implemented deep generative network architectures (DCGAN and WGAN) to synthetically compile fully interactive, topologically sound platformer map terrains.
Hidden Object Detection Engine
Fine-tuned advanced object detection models (YOLOv8 & Faster R-CNN) on highly cluttered datasets to perform real-time pixel extraction under tight classification confidence thresholds.
Get In Touch
I am always open to discussing engineering systems, technical architecture, or opportunities to scale your next machine learning solution.
Let's Connect
Drop a transmission through the terminal form or establish communication directly via corporate and developer network channels.
Dispatch Message
© 2026 Zaraar Malik