
HELLO THERE!
I'm Dhieddine BARHOUMI
-AI/Data Engineer
-MLOps Enthusiast
ABOUT ME
I'm a computer science engineering student at INSAT university, specializing in Artificial Intelligence and passionate about its incredible quick progress.
👨💻 ML & MLOps Enthusiast
📜
Google Cloud Professional ML Engineer
Certified.
📜
IBM AI Engineer
Certified.
📜
Microsoft Azure AI
Certified.
☁️ Experienced in Google Cloud Platform (GCP) and Microsoft Azure Platform.
WHAT I OFFER
As a machine learning engineer with cloud expertise, I specialize in building end-to-end ML solutions from development to deployment. I combine MLOps practices with cloud-native technologies to deliver scalable, production-ready AI systems.




SKILLS
With a knack for quick learning, I focus on mastering many skills and technologies needed for Machine Learning.






EXPERIENCE
Through diverse internships and hands-on roles, I've developed practical skills
and contributed to innovative solutions.
- Developed and maintained data pipelines for processing multi-modal sensor data, implementing ETL processes for real-time analytics.
- Engineered scalable data processing systems using Apache Spark, improving data throughput by 40% and reducing processing time.
- Implemented NLP techniques for scene understanding and text analysis, achieving 25% better accuracy in complex scenarios.
- Deployed AI models on cloud platforms (AWS, Azure) with containerized solutions using Docker and Kubernetes.
- Designed and implemented data validation and quality checks, ensuring 99.9% data integrity across all pipelines.
- Collaborated with cross-functional teams to integrate AI models into existing systems, ensuring seamless deployment and monitoring.
- Designed and implemented data pipelines for processing unstructured video and image data, optimizing storage and retrieval efficiency.
- Developed NLP-based text analysis systems for processing security logs and reports, improving information extraction accuracy by 30%.
- Implemented real-time data processing solutions using Apache Kafka and Spark Streaming for instant threat detection.
- Utilized MongoDB for efficient storage and querying of unstructured security data, reducing query time by 50%.
- Integrated AI models with existing systems using RESTful APIs and microservices architecture.
- Conducted research on multiple computer vision algorithms, choosing YOLOv8 for optimal speed and accuracy.
- Fine-tuned YOLOv8 for detecting angle and gusset objects in boxes, achieving 92.5% precision and 60% mAP50-95.
- Deployed the model as a real-time API on Microsoft Azure, integrating it with a dashboard for defect detection.
- Automated quality control, triggering alerts for misaligned or missing objects as boxes moved on a conveyor.
- Collaborated with a multidisciplinary team to replace manual defect detection for an industry client, boosting efficiency by 30%.
- Completed Microsoft Azure AI Fundamentals training and certification during the internship.
PROJECTS
Dive into a showcase of projects where I applied my technical expertise
to solve real-world challenges and create impactful solutions.
Personal Project
- Developed a robust Kubernetes-based model serving solution designed to host multiple ML model types, including embeddings, generative text, and classical ML.
- Implemented an API gateway using FastAPI to route incoming requests to various backend model servers like NVIDIA Triton, Text Generation Inference (TGI), and custom Python servers.
- Configured advanced deployment strategies, including blue/green and canary releases, to ensure safe and zero-downtime model updates.
- Integrated Evidently AI to generate automated data drift and model performance reports from captured production traffic.
- Established a CI/CD pipeline to automate the building of model containers, configuration of Kubernetes resources, and execution of contract tests.
Personal Project
- Built a real-time feature store and online serving system to provide low-latency features for recommender and risk models, achieving sub-5ms feature lookups.
- Engineered a streaming feature pipeline using Spark Structured Streaming to compute aggregations from a Kafka event stream, ensuring data freshness.
- Managed features using Feast, which orchestrates data flows into an offline store (Parquet) for model training and an online store (Redis) for serving.
- Guaranteed training-serving parity by using Feast to generate point-in-time correct training datasets and to enrich inference requests at serving time.
- Deployed a high-performance online inference service with FastAPI, integrating Feast's middleware for real-time feature retrieval.
Personal Project
- Architected a production-grade, end-to-end ML pipeline on Google Cloud, automating the entire workflow from data ingestion and validation to model training and deployment.
- Orchestrated the pipeline using Vertex AI and TensorFlow Extended (TFX), ensuring reproducible and scalable execution for components like ExampleGen, Transform, and Trainer.
- Integrated BigQuery for efficient data warehousing and TensorFlow Data Validation (TFDV) for automated data quality and schema enforcement.
- Developed a containerized REST API using Flask and Docker, deployed on Cloud Run for scalable, serverless model serving.
- Established a full CI/CD workflow with GitHub Actions to automate testing, container image pushes to Artifact Registry, and deployments to Cloud Run.
Personal Project
- Developed a full-stack machine learning application to predict loan approval status, deployed as a web service on Microsoft Azure.
- Constructed an end-to-end prediction pipeline using Scikit-learn, encompassing data ingestion, feature transformation, and model inference.
- Built a user-facing web interface with Flask and HTML/Bootstrap to capture input data and display model predictions in real-time.
- Implemented a complete CI/CD pipeline using GitHub Actions to automatically build, test, and deploy the containerized application to Azure Web App services.
- Engineered robust backend utilities, including custom exception handling and structured logging, to ensure application stability and maintainability.
Personal Project
- Developed a sophisticated Retrieval-Augmented Generation (RAG) system capable of answering questions from a private knowledge base, combining document retrieval with LLM-powered synthesis.
- Built a scalable and modular API using FastAPI, allowing for easy integration and interaction with both the RAG pipeline and an autonomous agent.
- Integrated multiple vector database backends, including pgvector for persistent storage and FAISS for high-speed in-memory search, providing flexibility in deployment.
- Implemented a rigorous evaluation framework using RAGAS to quantitatively measure retrieval and generation quality, ensuring the system's accuracy and relevance.
- Incorporated LLM guardrails to enforce output constraints, prevent harmful or off-topic responses, and ensure the agent's actions remain within a predefined scope.
Personal Project
- Created a configurable forecasting laboratory for training, evaluating, and analyzing multiple time-series models, including LightGBM, Prophet, and a TensorFlow-based Temporal Fusion Transformer (TFT).
- Designed a unified pipeline that handles feature engineering, backtesting, and hierarchical reconciliation for thousands of independent time series.
- Automated model evaluation and comparison using MLflow for experiment tracking and artifact storage.
- Implemented post-training analysis workflows to generate drift reports with Evidently AI and detect anomalies in forecast residuals.
- Structured the project with a configuration-driven approach, allowing for easy adaptation to new datasets and forecasting scenarios without code changes.
Personal Project
- Engineered a scalable data platform on GCP to ingest, process, and index unstructured text from public datasets like Hacker News, Wikipedia, and GitHub.
- Designed a multi-stage ETL pipeline using a Bronze-Silver-Gold architecture on Google Cloud Storage, orchestrated daily with Cloud Composer (Airflow).
- Implemented distributed data processing and embedding generation jobs with PySpark on Dataproc Serverless, using Sentence Transformers to create vector embeddings.
- Built a low-latency semantic search service using FAISS for vector indexing and a FastAPI application deployed on Cloud Run.
- Integrated Great Expectations for automated data quality validation and contract enforcement at critical stages of the data pipeline.
Personal Project
- Developed a production-ready starter kit for building a lakehouse platform that transforms raw data into governed, high-quality data products.
- Implemented a multi-layered data architecture (Bronze, Silver, Gold) using dbt for SQL-based transformations and schema modeling.
- Enforced data quality and reliability by defining data contracts directly within dbt models and integrating Great Expectations for automated validation.
- Generated comprehensive documentation and data lineage graphs automatically through dbt Docs, providing full visibility into data flows.
- Established a CI/CD pipeline that runs data loading scripts, executes dbt transformations, and validates data quality on every commit.
Personal Project
- Designed a high-throughput streaming analytics platform to process and visualize e-commerce clickstream data with sub-second latency.
- Architected a data pipeline using Kafka for event ingestion and Spark Structured Streaming for real-time aggregations, sessionization, and KPI calculation.
- Utilized a dual-storage strategy, writing aggregated results to MongoDB for low-latency dashboard queries and raw events to Parquet on cold storage for historical analysis.
- Developed an interactive, auto-refreshing dashboard with Streamlit to display live conversion funnels, revenue metrics, and anomaly alerts.
- Exposed analytics data via a REST API built with FastAPI, enabling programmatic access to real-time and historical KPIs.
Academic Project
- Designed a reinforcement learning-based system to navigate complex indoor environments.
- Simulated realistic home-like environments in Gazebo with dynamic obstacles.
- Leveraged ROS2 for seamless integration, utilizing RViz for real-time monitoring of the robot's path and sensor data.
- Collaborated with colleagues, employing GitHub for CI/CD, ensuring robust version control and seamless updates.
- Implemented the TD3 algorithm with a custom reward function to optimize path planning and obstacle avoidance.