Cross-Model Eval Summary

AI Descriptors improve code generation quality across models while consuming 6x fewer tokens
Models: Claude Sonnet 4 & Gemini 2.5 Flash · 72 total eval runs · 6 components · April 4, 2026
6.5x
Fewer input tokens on average when using AI descriptors instead of raw source code. Consistent across both Claude and Gemini.
+7pp
Average quality improvement with descriptors. The lightweight JSON contract captures the API surface better than hundreds of lines of implementation code.
2
Models tested. The pattern holds across model providers — descriptors are a model-agnostic improvement to component distribution.

Model Comparison

Model Raw Score Descriptor Score Raw Tokens Desc Tokens Token Ratio Efficiency Gain Raw TS Valid Desc TS Valid
Claude Sonnet 4 80% 89% 5742 869 6.6x 7.4x 72% 100%
Gemini 2.5 Flash 72% 77% 5246 823 6.4x 6.8x 56% 72%

Cross-Model Analytics

Quality Score by Model

0%25%50%75%100%80%89%Claude Sonnet 472%77%Gemini 2.5 FlashRaw SourceAI Descriptor

Token Consumption by Model

014352871430657425742869Claude Sonnet 45246823Gemini 2.5 FlashRaw SourceAI Descriptor

Detailed Reports

Individual model reports with full side-by-side code comparisons: