Product Matching using Sentence-BERT: A Deep Learning Approach to E-Commerce Product Deduplication

Product Matching Sentence-BERT E-commerce Semantic Similarity

Authors

December 17, 2024
December 19, 2024

Downloads

Product matching in e-commerce platforms presents a significant challenge due to variations in product titles, descriptions, and categorizations across different vendors. This paper presents a lightweight yet effective approach to product matching using Sentence-BERT (SBERT), specifically the all-MiniLM-L6-v2 variant. Our method combines efficient text preprocessing, strategic training pair generation, and threshold-based similarity matching to achieve high-accuracy product matching while maintaining computational efficiency. The system was evaluated on the Pricerunner dataset, achieving exceptional results with 98.10% accuracy, 100% precision, and 91.84% recall. The implementation includes a modular architecture that facilitates maintenance and updates, while the threshold-based matching strategy allows fine-tuned control over precision-recall trade-offs. Our results suggest that carefully designed preprocessing and training strategies, combined with lightweight transformer models, can achieve state-of-the-art performance in product matching without requiring complex model architectures or extensive computational resources.