HELLO THERE!

I'm Dhieddine BARHOUMI

-AI/Data Engineer

-MLOps Enthusiast

ABOUT ME

I'm a computer science engineering student at INSAT university, specializing in Artificial Intelligence and passionate about its incredible quick progress.

👨‍💻 ML & MLOps Enthusiast
📜 Google Cloud Professional ML Engineer Certified.
📜 IBM AI Engineer Certified.
📜 Microsoft Azure AI Certified.
☁️ Experienced in Google Cloud Platform (GCP) and Microsoft Azure Platform.

WHAT I OFFER

As a machine learning engineer with cloud expertise, I specialize in building end-to-end ML solutions from development to deployment. I combine MLOps practices with cloud-native technologies to deliver scalable, production-ready AI systems.

ML Model Development

TensorFlow, PyTorch, Scikit-learn

MLOps & Cloud Solutions

GCP Vertex AI, Azure ML

Generative AI Solutions

LangChain, LLMs, Prompt Engineering

ML Pipeline Development

Data Processing, Model Monitoring

SKILLS

With a knack for quick learning, I focus on mastering many skills and technologies needed for Machine Learning.

Python Advanced

SQL Advanced

C++ Proficient

Java Proficient

R Proficient

TensorFlow Advanced

PyTorch Advanced

Scikit-learn Advanced

LangChain Proficient

Hugging Face Proficient

SpaCy Advanced

NLTK Advanced

Pandas Advanced

NumPy Advanced

OpenCV Proficient

Git Advanced

Docker Advanced

Kubernetes Advanced

Airflow Proficient

MLflow Proficient

Google Cloud Advanced

Azure Advanced

Vertex AI Advanced

BigQuery Advanced

Looker Proficient

EXPERIENCE

Through diverse internships and hands-on roles, I've developed practical skills
and contributed to innovative solutions.

AI Research Intern

@Institute For Machine Learning And Analytics (Hochschule Offenburg)

March 2025 - August 2025

Offenburg, Germany

Developed and maintained data pipelines for processing multi-modal sensor data, implementing ETL processes for real-time analytics.
Engineered scalable data processing systems using Apache Spark, improving data throughput by 40% and reducing processing time.
Implemented NLP techniques for scene understanding and text analysis, achieving 25% better accuracy in complex scenarios.
Deployed AI models on cloud platforms (AWS, Azure) with containerized solutions using Docker and Kubernetes.
Designed and implemented data validation and quality checks, ensuring 99.9% data integrity across all pipelines.
Collaborated with cross-functional teams to integrate AI models into existing systems, ensuring seamless deployment and monitoring.

AI in Security Systems Intern

@All Points Smart Solutions

June 2024 - August 2024

Amman, Jordan

Designed and implemented data pipelines for processing unstructured video and image data, optimizing storage and retrieval efficiency.
Developed NLP-based text analysis systems for processing security logs and reports, improving information extraction accuracy by 30%.
Implemented real-time data processing solutions using Apache Kafka and Spark Streaming for instant threat detection.
Utilized MongoDB for efficient storage and querying of unstructured security data, reducing query time by 50%.
Integrated AI models with existing systems using RESTful APIs and microservices architecture.

Computer Vision Intern

@DidaMind

June 2023 - July 2023

Tunis, Tunisia

Conducted research on multiple computer vision algorithms, choosing YOLOv8 for optimal speed and accuracy.
Fine-tuned YOLOv8 for detecting angle and gusset objects in boxes, achieving 92.5% precision and 60% mAP50-95.
Deployed the model as a real-time API on Microsoft Azure, integrating it with a dashboard for defect detection.
Automated quality control, triggering alerts for misaligned or missing objects as boxes moved on a conveyor.
Collaborated with a multidisciplinary team to replace manual defect detection for an industry client, boosting efficiency by 30%.
Completed Microsoft Azure AI Fundamentals training and certification during the internship.

PROJECTS

Dive into a showcase of projects where I applied my technical expertise
to solve real-world challenges and create impactful solutions.

January 2025 - February 2025

K8s Model Serving & CICD Playbook

Personal Project

Developed a robust Kubernetes-based model serving solution designed to host multiple ML model types, including embeddings, generative text, and classical ML.
Implemented an API gateway using FastAPI to route incoming requests to various backend model servers like NVIDIA Triton, Text Generation Inference (TGI), and custom Python servers.
Configured advanced deployment strategies, including blue/green and canary releases, to ensure safe and zero-downtime model updates.
Integrated Evidently AI to generate automated data drift and model performance reports from captured production traffic.
Established a CI/CD pipeline to automate the building of model containers, configuration of Kubernetes resources, and execution of contract tests.

December 2024 - January 2025

Real-Time Feature Store & Online Serving

Personal Project

Built a real-time feature store and online serving system to provide low-latency features for recommender and risk models, achieving sub-5ms feature lookups.
Engineered a streaming feature pipeline using Spark Structured Streaming to compute aggregations from a Kafka event stream, ensuring data freshness.
Managed features using Feast, which orchestrates data flows into an offline store (Parquet) for model training and an online store (Redis) for serving.
Guaranteed training-serving parity by using Feast to generate point-in-time correct training datasets and to enrich inference requests at serving time.
Deployed a high-performance online inference service with FastAPI, integrating Feast's middleware for real-time feature retrieval.

October 2024 - November 2024

Enterprise ML Pipeline on Google Cloud

Personal Project

Architected a production-grade, end-to-end ML pipeline on Google Cloud, automating the entire workflow from data ingestion and validation to model training and deployment.
Orchestrated the pipeline using Vertex AI and TensorFlow Extended (TFX), ensuring reproducible and scalable execution for components like ExampleGen, Transform, and Trainer.
Integrated BigQuery for efficient data warehousing and TensorFlow Data Validation (TFDV) for automated data quality and schema enforcement.
Developed a containerized REST API using Flask and Docker, deployed on Cloud Run for scalable, serverless model serving.
Established a full CI/CD workflow with GitHub Actions to automate testing, container image pushes to Artifact Registry, and deployments to Cloud Run.

September 2024 - October 2024

Loan Approval Decision Support

Personal Project

Developed a full-stack machine learning application to predict loan approval status, deployed as a web service on Microsoft Azure.
Constructed an end-to-end prediction pipeline using Scikit-learn, encompassing data ingestion, feature transformation, and model inference.
Built a user-facing web interface with Flask and HTML/Bootstrap to capture input data and display model predictions in real-time.
Implemented a complete CI/CD pipeline using GitHub Actions to automatically build, test, and deploy the containerized application to Azure Web App services.
Engineered robust backend utilities, including custom exception handling and structured logging, to ensure application stability and maintainability.

May 2024 - June 2024

Universal Knowledge RAG & Agent

Personal Project

Developed a sophisticated Retrieval-Augmented Generation (RAG) system capable of answering questions from a private knowledge base, combining document retrieval with LLM-powered synthesis.
Built a scalable and modular API using FastAPI, allowing for easy integration and interaction with both the RAG pipeline and an autonomous agent.
Integrated multiple vector database backends, including pgvector for persistent storage and FAISS for high-speed in-memory search, providing flexibility in deployment.
Implemented a rigorous evaluation framework using RAGAS to quantitatively measure retrieval and generation quality, ensuring the system's accuracy and relevance.
Incorporated LLM guardrails to enforce output constraints, prevent harmful or off-topic responses, and ensure the agent's actions remain within a predefined scope.

March 2024 - April 2024

Time-Series Forecasting & Anomaly Lab

Personal Project

Created a configurable forecasting laboratory for training, evaluating, and analyzing multiple time-series models, including LightGBM, Prophet, and a TensorFlow-based Temporal Fusion Transformer (TFT).
Designed a unified pipeline that handles feature engineering, backtesting, and hierarchical reconciliation for thousands of independent time series.
Automated model evaluation and comparison using MLflow for experiment tracking and artifact storage.
Implemented post-training analysis workflows to generate drift reports with Evidently AI and detect anomalies in forecast residuals.
Structured the project with a configuration-driven approach, allowing for easy adaptation to new datasets and forecasting scenarios without code changes.

January 2024 - February 2024

Unstructured Intelligence Platform on GCP

Personal Project

Engineered a scalable data platform on GCP to ingest, process, and index unstructured text from public datasets like Hacker News, Wikipedia, and GitHub.
Designed a multi-stage ETL pipeline using a Bronze-Silver-Gold architecture on Google Cloud Storage, orchestrated daily with Cloud Composer (Airflow).
Implemented distributed data processing and embedding generation jobs with PySpark on Dataproc Serverless, using Sentence Transformers to create vector embeddings.
Built a low-latency semantic search service using FAISS for vector indexing and a FastAPI application deployed on Cloud Run.
Integrated Great Expectations for automated data quality validation and contract enforcement at critical stages of the data pipeline.

December 2023 - January 2024

Lakehouse Data Products Starter Kit

Personal Project

Developed a production-ready starter kit for building a lakehouse platform that transforms raw data into governed, high-quality data products.
Implemented a multi-layered data architecture (Bronze, Silver, Gold) using dbt for SQL-based transformations and schema modeling.
Enforced data quality and reliability by defining data contracts directly within dbt models and integrating Great Expectations for automated validation.
Generated comprehensive documentation and data lineage graphs automatically through dbt Docs, providing full visibility into data flows.
Established a CI/CD pipeline that runs data loading scripts, executes dbt transformations, and validates data quality on every commit.

September 2023 - November 2023

Real-time E-commerce Analytics

Personal Project

Designed a high-throughput streaming analytics platform to process and visualize e-commerce clickstream data with sub-second latency.
Architected a data pipeline using Kafka for event ingestion and Spark Structured Streaming for real-time aggregations, sessionization, and KPI calculation.
Utilized a dual-storage strategy, writing aggregated results to MongoDB for low-latency dashboard queries and raw events to Parquet on cold storage for historical analysis.
Developed an interactive, auto-refreshing dashboard with Streamlit to display live conversion funnels, revenue metrics, and anomaly alerts.
Exposed analytics data via a REST API built with FastAPI, enabling programmatic access to real-time and historical KPIs.

August 2023 - September 2023

Autonomous Robot Navigation using Deep Reinforcement Learning

Academic Project

Designed a reinforcement learning-based system to navigate complex indoor environments.
Simulated realistic home-like environments in Gazebo with dynamic obstacles.
Leveraged ROS2 for seamless integration, utilizing RViz for real-time monitoring of the robot's path and sensor data.
Collaborated with colleagues, employing GitHub for CI/CD, ensuring robust version control and seamless updates.
Implemented the TD3 algorithm with a custom reward function to optimize path planning and obstacle avoidance.

I'm Dhieddine BARHOUMI

ABOUT ME

WHAT I OFFER

SKILLS

EXPERIENCE

PROJECTS

CERTIFICATES

CONTACT ME