Report · 2026-03-11 · 85d

DataLab: Closing the Data Gap as AI's Critical Bottleneck

Bobby Samuels, CEO of Protege, argues that data quality is the most underrated bottleneck limiting AI progress, not algorithms or computing power. He announces DataLab, a dedicated research institution designed to address the data gap by creating healthcare benchmarks, developing dataset quality standards, and researching data contamination and bias at scale.

3 metrics· Cited 0× in the knowledge base ·Open source ↗

Metrics in this report

ImageNet Assembly Timeline

3years

2006-2009

Time to organize, clean, and label dataset

ImageNet Dataset Size

3.2million images

initial version

Released 2009 for deep learning

USPS ZIP Code Dataset Size

~10,000images

historical

CNN training data from late 1980s