Automated Data Labeling: The Power of Going Programmatic
Labeling training data is exhausting—the de facto bottleneck most AI teams face today. Eager to alleviate this pain point of AI development, practitioners have long sought ways to automate this labor-intensive labeling process. Automate too little (e.g., with manual labeling optimizations such as active learning or model-assisted labeling) and the gains are marginal. Automate too much and your model becomes disconnected from the essential human-provided domain knowledge it needs to solve relevant problems. The key to truly transformative (e.g., 10x to 100x) efficiency improvements is to change the interface to labeling altogether, moving from manual labeling collecting individual labels one-by-one to programmatic labeling with labeling functions that capture labeling rationales. The result is a labeling process that is significantly more scalable, adaptable, and governable. In this talk, we review these techniques for automating parts of the labeling process, show how the Snorkel Flow platform integrates them in a unified framework, and share real-world experiences from Fortune 500 companies that have made the transition from manual to programmatic labeling.
Braden is a co-founder and Head of Technology at Snorkel AI. Before Snorkel, Braden researched and developed new interfaces to machine learning systems in academia (Stanford, MIT, Johns Hopkins, BYU) and industry (Facebook, Google).