Samasource and Cornell Tech Announce iMaterialist-Fashion, A Robust, Free Open Source Fashion Data Set for Research and Development

Samasource and Cornell Tech announced today their collaboration on iMaterialist-Fashion, a high-quality data set to enable research into advanced methods for clothing identification. Visual analysis of clothing is a topic that has received increasing attention, with benefits for brands and consumers. Being able to recognize apparel products and associated attributes (for example, lace or beading) from pictures could enhance the shopping experience and drive efficiency for retailers. The dataset will be part of the Fine Grained Visual Categorization (FGVC) workshop this June at CVPR, the premier annual computer vision conference. The FGVC workshop is co-sponsored by Google AI.

"Quality data is important for algorithmic success. Using the SamaHub, Samasource's fashion annotators were consistently able to produce quality results and on time deliveries for the Cornell Tech team to help further our research and development for the fashion dataset," says Cornell Tech professor Serge Belongie"This dataset will facilitate significant advances in computer vision with the potential for wide-reaching consumer engagement."

"At Samasource, we're committed to advancing the AI industry, including supporting open source data initiatives. We're thankful to the Cornell Tech team for sharing this vision and facilitating the development of this open source dataset. They were the ideal partner." said Loic Juillard, VP Engineering, Samasource.

The Cornell Tech research team turned to Artificial Intelligence trained by Samasource with the goal of introducing a novel, fine-grained segmentation task by joining forces between the fashion and computer vision industries. The Cornell Tech team proposed a fashion taxonomy built by fashion experts, informed by product description from the internet. To capture the complex structure of fashion objects and ambiguity in descriptions obtained from crawling the web, the standardized taxonomy contains 46 apparel objects (27 main apparel items and 19 apparel parts), and 92 related fine-grained attributes. A total of around 50K clothing images in daily-life, celebrity events, and online shopping were labeled by Samasource's fashion annotators for fine-grained segmentation.

The Cornell Tech and Samasource teams used Samasource's secured cloud annotation platform, SamaHub, to manage the entire annotation lifecycle. This includes image upload, annotation, data sampling and QA, data delivery, and overall collaboration. Additionally, automated workflows in the SamaHub task que enables a dedicated, trained team of Samasource workers to annotate targeted images in record time. All Samasource annotation team members are located in a secure, ISO-certified facility ensuring maximum data security.

This work builds on Professor Belongie's computer vision research, including building photo identifiers for hundreds of species of birds, furniture, plants, and more.

About Samasource 
25% of the Fortune 50 trust Samasource to deliver secure, high-quality training data and validation for the technology teams driving humanity forward. From self-driving cars to smart hardware, Samasource fuels AI. Founded over a decade ago, we're experts in image, video and sensor data annotation and validation for machine learning algorithms in industries including automotive, navigation, AR/VR, biotech, agriculture, manufacturing, and e-commerce. Our staff are driven by a mission to expand opportunity for low-income people through the digital economy, and our social business model has helped over 50,000 people lift themselves out of poverty. To learn more, visit

Contact: Natalie Schoen
BAM Communications 

SOURCE Samasource