Clouded Judgement: From GPU Hours to Token Dollars—Pricing AI Infrastructure for the Inference Era
The article argues that AI-native companies must shift from traditional SaaS pricing models to token-based or credit-based pricing to align costs with value delivery, as inference costs create existential pressure on margins. It contends that GPU monetization is transitioning from hourly rental to token-based pricing, dramatically increasing revenue per GPU hour and creating new business model opportunities. The piece includes SaaS valuation benchmarks across growth cohorts and operational metrics from public cloud software companies.
Metrics in this report
33months
median
Public SaaS companies
3.1multiple
median
Overall public SaaS market
16.1multiple
median
Top 5 highest-growth SaaS companies
9.7multiple
median
High-growth SaaS (>22% projected NTM growth)
5.5multiple
median
Mid-growth SaaS (15-22% projected NTM growth)
2.5multiple
median
Low-growth SaaS (<15% projected NTM growth)
19percent
median
Public SaaS companies
15percent
median
Public SaaS companies
1000000tokens per second
Full 72-GPU rack token generation capacity
2dollars
Current market rate for H100 GPU hourly rental (AWS, CoreWeave, Lambda)
4dollars
Current market rate ceiling for H100 GPU hourly rental
76percent
median
Public SaaS companies
5x throughput improvement
Vera Rubin vs. Blackwell inference throughput comparison
15percent
median
Public SaaS companies
13percent
median
Public SaaS companies
109percent
median
Public SaaS companies
0percent
median
Public SaaS companies
23percent
median
Public SaaS companies
35percent
median
Public SaaS companies
10x cost reduction
Vera Rubin vs. Blackwell token cost improvement
0.15dollars per million tokens
Rock-bottom commodity output token pricing
8dollars per million tokens
Mid-tier model output token pricing
15dollars per million tokens
Mid-tier model output token pricing (high end)
10dollars per million tokens
Premium output token pricing for model APIs