A frontier human data & evaluation lab

Making AI better
for creativity.

Human taste is the new training data. We provide the evaluations, benchmarks, and datasets that align AI to genuine creative judgment.

Request partnership → Learn more
1.5M+
Verified creative experts
$250M+
Collective creator earnings
400+
Creative skills evaluated
50+
AI models evaluated

Trusted by teams building with creative AI

Webflow
Framer
Figma
Midjourney
Adobe
Canva
Runway
Stability AI
Notion
Kling
Pika
ElevenLabs
Webflow
Framer
Figma
Midjourney
Adobe
Canva
Runway
Stability AI
Notion
Kling
Pika
ElevenLabs
"Execution is free.
Now, judgment is everything."
The new standard for creative AI evaluation

The full stack for creative AI quality.

From benchmarks to datasets to live evaluations — everything you need to build AI that genuinely understands human creativity.

⚔️

Creative Arena

Head-to-head AI model evaluations judged by verified creative professionals. Real human taste, applied systematically. See exactly where your model excels and where it falls short.

  • ✓  Side-by-side model comparisons
  • ✓  Verified expert judges
  • ✓  Detailed skill breakdowns
Learn about Arena →
📊
Industry Standard

Human Creativity Benchmark

The definitive benchmark for creative AI, built on judgments from 1.5M+ verified creatives across 400+ distinct skills. Compare models on the metrics that actually matter to practitioners.

  • ✓  400+ creative skill dimensions
  • ✓  Reproducible, auditable results
  • ✓  Updated continuously
View Benchmark →
🎨

Creative Human Data

Preference datasets generated by top creative professionals who label, rank, and critique AI outputs. Fine-tune and align your models with data that reflects genuine aesthetic judgment.

  • ✓  Expert labeling & ranking
  • ✓  Custom dataset requests
  • ✓  Multi-modal coverage
Explore Data →
🤝
Private Beta

Co-Agents

Creative AI collaborators designed to enhance expert workflows. Built for and with top creative professionals, they bridge the gap between raw AI capability and production-quality creative work.

  • ✓  Expert workflow integration
  • ✓  Domain-specific fine-tuning
  • ✓  Private beta access available
Join Beta →

The judgment models are missing.

Traditional benchmarks measure what AI can generate. We measure whether anyone with taste would actually want it.

  • 🎯

    Skill-level granularity

    Results broken down across 400+ specific creative skills — from typographic hierarchy to color theory to narrative pacing.

  • 👁️

    Verified expert evaluators

    Every judgment comes from credentialed practitioners with demonstrated expertise in that specific creative domain.

  • 🔄

    Living benchmark

    Continuously updated as new models release and creative standards evolve. Always current, never stale.

View full benchmark results

Creative Skill Scores — Model A

Typography
91
Color Theory
87
Composition
83
Narrative
76
Originality
68
Motion
61

Human taste, at scale.

Real creative professionals judge AI outputs head-to-head. No synthetic proxies, no automated metrics — just expert judgment, collected at the scale needed to train frontier models.

1.5M+
Independent creatives in our network
50+
Frontier models evaluated in the Arena
26×
Higher project earnings vs. typical platforms
400+
Distinct creative skill dimensions covered
$250M+
Collective creator earnings facilitated
48h
Average turnaround for custom evaluations

Ready to evaluate what matters?

Partner with the lab building the infrastructure for human-aligned creative AI.

Request partnership → Read our research