About me
I am a Machine Learning Engineer specializing in Computer Vision and Audio Intelligence, with experience developing end-to-end systems that encompass everything from dataset preparation to optimized inference and the creation of working demos I work with modern vision architectures (YOLO, U-Net, DPT, OCR pipelines) and spectrogram-based models for audio. I design and run ablation studies, model comparisons, advanced metric analysis (F1 macro/micro, mAP, precision/recall, logits, probabilities by class), and visualizations that allow understanding the internal behavior of the model I am particularly interested in building reproducible pipelines, optimizing models for production and integrating explainability techniques. My approach is practical, results-oriented, and based on a deep understanding of each stage of the model lifecycle
Featured projects
Generating a 360° panorama from a PTZ camera and estimating the depth of the entire scene
Both images can be enlarged by clicking to view reconstruction details
Biomedical segmentation project that detects and delineates cancerous regions in microscopic images The model generates precise masks that allow for the analysis of the extent and morphology of malignant areas
The images include: original sample and generated mask
- InternImage‑L, ViTPose and YOLOv7 are used for the detection and analysis of EPI
- Optimization with TensorRT and deployment in production environments
Examples of the Personal Protective Equipment (PPE) detection system in different scenarios
The images show detection of helmets, vests and other safety equipment, with bounding boxes and real-time sorting
Pipeline End‑to‑End
The complete system includes all phases of the model lifecycle:
- Dataset: Collection, cleaning, and annotation of PPE images
- Train: YOLOv7 model with augmentations and validation
- Inference: Optimized script for real-time prediction
Ablation Studies
Model Comparisons
Visual comparisons of models
Final Results
2x2 comparison showing the original image, the super-resolution version and zooms of both.
A multilabel classification system capable of identifying multiple animal species from audio. Each clip is transformed into a Mel spectrogram and processed using a convolutional neural network. The model produces probabilities per class, logits, and advanced metrics such as F1, mAP, macro/micro accuracy, and recall.
- Architecture with chunking, embeddings, and semantic retrieval
- Custom libraries for preprocessing and integration with conversational systems.
Tech stack
Publications
- Automated PPE compliance monitoring in industrial environments — Automation in Construction, 2025
- Maritime Surveillance by Multiple Data Fusion — VISIGRAPP 2023
Contact
📧 ll11ll1@outlook.es
🔗 LinkedIn
💻 GitHub