We participated in Amazon ML Challenge 2024 with a solution for extracting product attributes from images.
- Aman Prakash (Lead), NIAMT, Ranchi
 - Sagnik Pramanik, Heritage Institute of Technology, Kolkata
 - Ankit Rai, NIAMT, Ranchi
 - Abhinav Sinha, BIT Mesra, Ranchi
 
Our approach uses the Moondream Vision Language Model (VLM), which processes images from the test.csv file to extract specific attributes like weight, dimensions, and more.
- Moondream VLM (1.6B parameters) was used for lightweight image-to-text processing.
 - Extracted key product attributes using targeted prompts.
 - Output cleaning and standardization were done using regex for consistency.
 
For more details, refer to the main script: main_team_qstart_amazonml.ipynb.