Report · 2026-03-11
· 85d
DataLab: Closing the Data Gap as AI's Critical Bottleneck
Bobby Samuels, CEO of Protege, argues that data quality is the most underrated bottleneck limiting AI progress, not algorithms or computing power. He announces DataLab, a dedicated research institution designed to address the data gap by creating healthcare benchmarks, developing dataset quality standards, and researching data contamination and bias at scale.
Metrics in this report
ImageNet Assembly Timeline
3years
2006-2009
Time to organize, clean, and label dataset
ImageNet Dataset Size
3.2million images
initial version
Released 2009 for deep learning
USPS ZIP Code Dataset Size
~10,000images
historical
CNN training data from late 1980s