Public SaaS companies command a 76% gross margin because software scales without meaningful unit economics decay—a luxury AI infrastructure providers now find themselves defending as inference costs reshape the economics of the stack.
The shift from GPU hours to token-based pricing isn't merely a billing model evolution; it's a margin compression test for every layer between foundation models and end applications, with hyperscalers, inference platforms, and application vendors all recalibrating what sustainable profitability looks like in an era of commoditizing compute. ICONIQ's benchmark work arrives as the industry grapples with whether AI services will settle into SaaS-like margins or something structurally thinner. Watch how quickly token prices decline relative to model performance gains—that ratio tells the whole story.