Does feeding an LLM 500 lines of source code beat a 30-line JSON descriptor? We tested 6 components across 2 models with 72 eval runs.
See AI-rendered components side by side — raw source vs descriptor:
| Model | Raw Score | Descriptor Score | Raw Tokens | Desc Tokens | Token Ratio | TS Validity (Raw/Desc) |
|---|---|---|---|---|---|---|
| Gemini 2.5 Flash | 97% | 95% | 5023 | 894 | 5.6x | 100% / 100% |
| Claude Sonnet 4 | 94% | 93% | 5116 | 797 | 6.4x | 100% / 100% |
pnpm eval:comparative
pnpm eval:comparative:gemini
pnpm eval:report:comparativeFull reports with side-by-side generated code comparisons: