Mathis Derenne

Projects

Various personal, academic and professional projects.

A reference implementation and framework for building open-source data warehouses. It provide a comprehensive framework for data ingestion, transformation, modeling, and reporting into an production-ready workflow. Designed for LLM-based agentic workflows, it embeds domain context and structured documentation directly into the platform core.

Dagster dlt dbt DuckDB Streamlit
data engineering
Orca Demo: Data Warehouse Implementation
2026
personal

A local data warehouse and predictive ML demo featuring orchestration, ingestion, transformation and an interactive dashboards.

Dagster dlt dbt DuckDB Streamlit
data engineering machine learning
Conformal Prediction
2026
academic

Implementation and benchmarking of conformal prediction techniques across regression, classification, and survival analysis tasks.

Python polars scikit-learn lifelines
machine learning statistics
Billboard Rental Pricing Model
CoSpirit Groupe
2025
professional

Engineered a predictive pricing model for billboard advertisements using historical contract data. Implements Conformalized Quantile Regression to provide statistical guarantees on price intervals. Built a spatial feature engineering module using KD-Trees and Gaussian kernel smoothing to capture localized geographical trends from regression residuals.

Python polars scikit-learn
machine learning statistics
Marketing Data Warehouse
CoSpirit Groupe
2026
professional

Designed ELT pipelines and dimensional models to warehouse multi-channel marketing data. Developed Power BI datasets to enable cross-media performance analysis following dimensional modeling practices (fact, dims, star schema, SCD).

Microsoft SSIS SQL Server DAX
data engineering
Dashboards for Ads Campaigns
CoSpirit Groupe
2026
professional

Designed and deployed over 20 interactive Power BI dashboards enabling media experts to optimize ad spend and clients to track their advertising campaigns.

Power BI DAX PowerQuery
data visualization
Explainable AI Case Studies
2026
academic

A collection of six case studies demonstrating model interpretability and explainability methods across tabular, vision, NLP, and graph domains.

Python PyTorch SHAP DiCE Captum
machine learning
Reinforcement Learning for Stochastic Inventory Management
2026
academic

Formulation and resolution of a stochastic inventory control problem. Benchmarks Model-Based Dynamic Programming (Value Iteration via Bellman optimality) against Model-Free Reinforcement Learning (REINFORCE Policy Gradient) to compute optimal ordering policies.

Python NumPy
machine learning
Natural Language Inference Modeling
2026
academic

Development and evaluation of NLI models to predict semantic relationships between sentence pairs. Explores encoder fine-tuning (CamemBERT via LoRA), decoder fine-tuning (Llama 3 via LoRA), and In-Context Learning techniques (Chain-of-Thought).

Python PyTorch CamemBERT Llama 3 LoRA
NLP machine learning
Federated Learning for Medical Diagnosis
2026
academic

A distributed, privacy-preserving Federated Learning system designed for heart disease diagnosis. This project analyzes the impact of client data heterogeneity (Non-IID distributions) and benchmarks the stability and convergence of FedAvg against FedProx algorithms.

Python PyTorch Fluke
machine learning
Computer Vision Billboard Anomaly Classifier
CoSpirit Groupe
2025
professional

Designed and deployed an end-to-end computer vision pipeline to detect structural anomalies, incorrect placements, and degradation on billboards from audit photos. Automates manual physical quality control processes by classifying over 10,000 inspection images annually.

Python PyTorch OpenCV
machine learning computer vision
Non-Local Means Image Denoising
2025
academic

Implementation and performance analysis of the Non-Local Means algorithm for solving image restoration inverse problems.

Python deepinverse scikit-image
computer vision
Rennes Data Challenge: Cryptocurrency Forecasting
2024
personal

Developed time-series forecasting architectures to predict cryptocurrency price movements. Designed and benchmarked classical econometric models (VARMAX, SARIMAX) against machine learning methods (XGBoost) and deep learning structures (LSTMs, Auto-encoders).

Python TensorFlow XGBoost skforecast
machine learning statistics
Head Coach Dismissal Impact on Team's Performance
2024
academic

An empirical statistical study evaluating the causal impact of head coach replacements on sports team performance. Investigates whether structural improvements occur post-dismissal or if observed performance changes are driven by regression to the mean.

R Python
causal inference statistics
Parametric & Non-Parametric Classification Benchmarks
2024
academic

A comprehensive analysis and clean-room implementation of binary classification algorithms, including KNN (with Condensed Nearest Neighbor reduction), SVMs, and ensemble methods (AdaBoost, Random Forests). Features custom optimizations handling severe class imbalances.

Python scikit-learn MyST
machine learning
Spam Detection Classifier
2024
academic

Built a natural language processing classifier to detect spam within French text datasets using TF-IDF feature extraction and optimized Naive Bayes, Logistic Regression, and SVM models.

Python scikit-learn MyST
NLP machine learning
Manifold Learning & Dimensionality Reduction
2024
academic

Implementations and comparative visualizations of manifold learning algorithms (Isomap, LLE, t-SNE) to map high-dimensional geometric datasets into lower-dimensional representations.

Python scikit-learn
machine learning data visualization
RIMO: Open-Source Data Anonymization Engine
CGI France — Internship
2023
professional

Developed RIMO, an open-source tool for high-throughput data anonymization and GDPR compliance within the LINO-PIMO software ecosystem. Engineered a production-ready Go engine from an initial Python prototype to optimize performance, concurrent processing, and strict type safety.

Go Python
data engineering
Causal Inference Voucher Distribution System
Auto1 Group — Internship
2023
professional

Built an end-to-end voucher targeting system to re-engage inactive users. Leveraged SQL for large-scale data extraction and Propensity Score Matching (PSM) to control for confounding selection bias, establishing the true causal impact and ROI of retention campaigns on downstream sales.

SQL Python Streamlit
causal inference data visualization
Fuel Consumption & Emissions Dashboard
2022
personal

An interactive analytical web application and predictive tool forecasting vehicle fuel efficiency and emission rates based on structural and engine features.

Python Streamlit polars Altair scikit-learn
machine learning data visualization