Machine learning (ML) models are shaped by data and building inclusive ML systems requires considering how to design representative datasets. However, few beginner-friendly ML modeling tools focus on teaching dataset design practices, such as designing for data diversity and inspecting data quality.
To address this, we present four data design practices (DDPs) for designing inclusive ML models. We also introduce Co-ML, a tablet-based application that promotes hands-on learning of DDPs through collaborative ML model building. Co-ML allows beginners to create image classifiers in a distributed experience, where data is synchronized across multiple devices. This enables users to iteratively refine ML datasets through discussion and coordination with their peers.
We tested Co-ML in a 2-week AIML Summer Camp, where groups of youth aged 13-18 built customized ML-powered mobile applications. Our analysis shows that using Co-ML for multi-user model building supported the development of DDPs, such as incorporating data diversity, evaluating model performance, and inspecting data quality. We also found that students often prioritized learnability over class balance when attempting to improve model performance. This work demonstrates how collaboration, model testing interfaces, and student-driven projects can empower learners to actively explore the role of data in ML systems.