Zaraar Malik

I build AI systems
that actually ship.

AI Engineer specializing in production-grade Retrieval-Augmented Generation (RAG), distributed multi-agent systems, and real-time computer vision deployment. I build resilient infrastructure that bridges the gap between complex research and scalable, low-latency products.

ProductionRAG Systems
Multi-AgentArchitectures
End-to-EndMLOps Pipelines

About Me

Zaraar Malik

Bridging Rigorous Research with Production Infrastructure

I view artificial intelligence not as a collection of standalone black-box API calls, but as an optimization challenge. With a formal academic grounding in AI systems engineering, my focus lies in constructing predictable, robust architectures capable of serving production environments cleanly.

Whether integrating real-time computer vision capabilities into embedded edge systems, refining chunking logic for deterministic vector retrieval, or managing data validation layers, I engineer for reliability, strict cost constraints, and minimized latency.

Current Domain Focus Areas:

Advanced RAG Systems

Designing context-aware retrieval engines optimized for high-density technical knowledge bases.

SLM Optimization

Adapting Small Language Models to execute complex reasoning tasks smoothly on constrained hardware.

Agentic Architecture

Implementing multi-agent frameworks designed for precise tracking and autonomous decision-making.

Work Experience

AI Engineer

Apexbeat — Manchester, UK (Remote)

September 2025 - Present
  • Architected and scaled a production Multimodal Retrieval-Augmented Generation (MMM-RAG) medical reasoning platform on AWS, serving 2,000+ active users.
  • Designed distributed ETL pipelines for automated multi-format document ingestion, clean text sanitization, and optimized vector database indexing.
  • Engineered stateful session management layers using MongoDB to securely cache interactive historical user conversational context.

AI Intern

Systems Limited — Islamabad, Pakistan

June 2025 - August 2025
  • Executed parameter-efficient domain adaptation (PEFT) and fine-tuning on open-source LLMs, substantially improving domain-specific response relevance.
  • Built production Azure CI/CD pipelines and containerized modular AI services utilizing Docker to guarantee reproducible cluster deployments.
  • Conducted rigorous empirical feasibility studies on generative models, benchmarking latency, memory footprint, and dollar-cost trade-offs.

AI Research Assistant

FSM — Islamabad, Pakistan

September 2024 - March 2025
  • Collaborated closely with Data Engineering to orchestrate robust financial feature stores, running predictive analytics for churn and portfolio risk modeling.
  • Developed an intelligent, RAG-driven investment assistant tailored to pull regional market indicators and localized regulatory insights.
  • Authored modular, highly reproducible evaluation pipelines to dramatically accelerate internal model iteration and testing velocity.

AI Intern

AIO (Silicon Valley) — Islamabad, Pakistan

June 2024 - August 2024
  • Refined structural data-cleaning pipelines alongside core data infrastructure teams to optimize high-quality training tokens for vision models.
  • Designed and coded proprietary alignment evaluation metrics to accurately audit model out-of-distribution drift and degradation patterns.
  • Achieved a 20% improvement in model throughput via post-training quantization, hyperparameter optimization, and weights pruning.

Skills & Core Tools

Core Runtime & Interfaces

Python
JavaScript
Next.js
Tailwind CSS
Java
HTML5
CSS3

AI Architecture & Deep Learning

PyTorch
TensorFlow
Scikit-Learn

Distributed Data & Vector Storage

MongoDB
PostgreSQL
Qdrant
Qdrant
Chroma DB
Chroma DB

Cloud Engineering & Orchestration

FastAPI
Flask
Docker
AWS
Vercel
Git

Technical Projects

Multi-Agent Chatbot for Medical Reimbursement

Architected an autonomous multi-agent validation ecosystem designed to parse and cross-check complex clinical receipts, minimizing human audit overhead through graceful runtime error-handling pipelines.

PythonRAGFastAPILangChainTransformers

TinyLLM RAG Chatbot for Task-Specific Queries

Evaluated performance constraints of parameter-efficient Small Language Models (SLMs) in RAG tasks, optimizing dynamic context chunking profiles to reduce consumer hardware compute costs.

PythonTinyLLMRAGLangChainVector DBs

Multi-Document RAG System

Engineered a scalable multi-source retrieval engine capable of querying massive unstructured text corpora using optimized document indexing topologies and deterministic retrieval layers.

PythonRAGFAISSQdrantLangChain

Snap Shop – GenAI Fashion Synthesizer

Developed a deep learning virtual try-on application featuring custom LoRA fine-tuned Stable Diffusion models to handle high-fidelity text-to-garment image synthesis pipelines.

PythonPyTorchDiffusion ModelsLoRAFastAPI

Procedural Game Level Generation

Implemented deep generative network architectures (DCGAN and WGAN) to synthetically compile fully interactive, topologically sound platformer map terrains.

PythonPyTorchDCGANWGANGANs

Hidden Object Detection Engine

Fine-tuned advanced object detection models (YOLOv8 & Faster R-CNN) on highly cluttered datasets to perform real-time pixel extraction under tight classification confidence thresholds.

PythonPyTorchYOLOv8Faster R-CNNReact

Get In Touch

I am always open to discussing engineering systems, technical architecture, or opportunities to scale your next machine learning solution.

Let's Connect

Drop a transmission through the terminal form or establish communication directly via corporate and developer network channels.

Dispatch Message

© 2026 Zaraar Malik