Design System Auditing via Figma MCP

The question: can Claude, via Figma MCP, act as an automated design system compliance reviewer?

Three custom slash commands to test it. /ingest-design-system walks a Figma design system file and generates structured documentation (token tables, component variant matrices, audit rules) derived from the live file via Figma MCP. /audit-design loads that representation and runs a three-layer check against any target screen: token existence, context validation, component compliance. /document-design does a lighter pass, documenting token usage without making compliance judgments.

The design system was real: 30+ color tokens, 17 composite text styles, 7 components, ingested across 9 pages.

Ingestion

/ingest-design-system runs six phases: extract all tokens via get_variable_defs, map the component hierarchy via get_metadata, call get_design_context on each component, generate foundation docs (colors, typography, spacing, shadows, radius), generate component docs (variants, states, token usage), auto-derive audit rules from the extracted tokens.

Nine pages took 30–45 minutes. The output, a .claude/design-system/ directory of structured markdown, was good enough to reason against: token names, component variant matrices, property-level constraints, derived rules mapping which token category is valid for which CSS property.

Example output, the Button component:

| Type      | State    | Background            | Text    | Height | Radius | Padding      |
|-----------|----------|-----------------------|---------|--------|--------|--------------|
| Primary   | Normal   | rgba(255,255,255,0.8) | #3B3B3B | 40px   | 8px    | 8px 16px     |
| Primary   | Touch    | #F0F0F0               | #3B3B3B | 40px   | 8px    | 8px 16px     |
| Primary   | Disabled | #F0F0F0               | #9B9B99 | 40px   | 8px    | 8px 16px     |
| Secondary | Normal   | rgba(255,255,255,0.5) | #3B3B3B | 40px   | 8px    | 8px 16px     |
| Dark      | Normal   | rgba(255,255,255,0.2) | #FFFFFF | 40px   | 8px    | 8px 16px     |
| Dark      | Disabled | rgba(255,255,255,0.1) | #9B9B99 | 40px   | 8px    | 8px 16px     |

Every component comes out like this: 18 button variants, 6 card types, full state coverage. The audit rules derive directly from these tables.

The audit

The check runs three layers against every property on every node:

Layer 1: Token existence
  Does the value match a documented token?
  Raw hex/px value with no token = VIOLATION.

Layer 2: Context validation
  Is the token from the right category?
  Text color token used as background-color = WARNING.

Layer 3: Component compliance
  Does the instance match its documented variant spec?
  Button with lg padding but xl gap/font = VIOLATION.

zoom

Audit report — 38% token compliance on a production screen

Violations map to layer paths, show current vs. expected values, suggest the correct token. The compliance score made the finding legible immediately: 38% token compliance on a production screen is a concrete problem, not a vague design critique.

The cost: one screen took about 7 minutes and roughly $0.30–$0.50 in API calls (Sonnet pricing, loading design system context plus three MCP calls per screen, property-level resolution across every node). Not slow exactly, but expensive enough that auditing dozens of screens adds up fast. The math doesn't work for continuous compliance checking.

The unexpected finding

After the ingestion, I ran a different test: generate a code version of the app from the tokenized design system docs. It was faster and more accurate than auditing a single screen through Figma MCP.

The structured markdown (token tables, component specs, variant matrices) gave Claude enough context to produce reasonable UI code without touching Figma at all. Meanwhile, pulling live design context through MCP for one screen was slower and more expensive.

That's the inversion: the semantic representation outperformed the visual one. Figma gives Claude pixels and layer trees. The token docs gave it meaning.

What it points at

Design systems aren't built for AI consumption. Token names are optimized for human readability. Component structures reflect visual organization, not constraint sets that can be validated programmatically. The audit process (load full context, walk every node, resolve every property) mirrors how a human designer would review a screen, which is exactly why it's expensive.

An AI-native design system would be different: token docs that explicitly map context and constraint, component specs structured as validation rules, primary source in code with Figma as a view layer. Maybe this already exists. Worth building if it doesn't.

The commands, schema, and audit output are all shareable. Happy to drop them on request.