My Bachelor’s thesis, titled "A Multimodal Approach to Automating Product Listings with Machine Learning", develops a framework to automate clothing listings on e-commerce platforms like eBay and Vinted. The project addresses inefficiencies in manual listing creation by integrating computer vision, natural language processing (NLP), and a custom dataset.
The main research question is: *How can a multimodal machine learning framework automate clothing listings for online marketplaces?* Sub-questions include:
- How can computer vision ensure precise color extraction from clothing images?
- How can a BERT model be optimized for platform-specific category classification?
- How effective are large language models (LLMs) in generating platform-specific titles and descriptions?
Color Extraction: Used SAM2 for segmentation and K-means clustering to extract dominant colors, evaluated on Kaggle datasets (e.g., 0.2738 eBay F1-score).
Category Classification: Fine-tuned BERT-base-uncased for multi-task classification, achieving 0.9964 accuracy for eBay and 0.9738 for Vinted on a synthetic dataset of 8,400 samples.
Text Generation: Evaluated four Ollama LLMs (TinyLlama, Smollm2, Mistral, Phi-4) for platform-specific titles and descriptions, using ROUGE and BLEU metrics.
Dataset: Created a custom dataset via automated scraping and manual validation to address the lack of standardized training data.
UI: Developed a PyQt5-based desktop application with a multi-page interface for image uploads, processing, and review of generated listings.
End-to-End Framework: Integrates SAM2, BERT, and TinyLlama for a cohesive listing automation pipeline.
Custom Datasets: Two synthetic datasets for training and evaluation, addressing gaps in e-commerce data.
Reproducible Notebooks: Three Jupyter notebooks (CV_Evaluation.ipynb, Bert_Fine_Tuning.ipynb, LLM_Evaluation.ipynb) for reproducibility and learning.
Scalable Tool: A user-friendly application that reduces manual effort and enhances listing consistency.
The framework achieved significant time savings while maintaining high accuracy. It offers a scalable solution for e-commerce automation, with potential for API integration.
All code, data, and models are available in my GitHub Repository: https://github.com/C-Boateng/Automated-Listing-Tool.