Summary for Seedream 3.0
Seedream 3.0 establishes itself as a highly capable "aesthetic engine," excelling significantly in artistic and structural categories while showing limitations in strict realism and text generation.
Key Findings:
- Top Tier Architecture: The model achieved its highest average score (8.7) in Architecture & Interiors, demonstrating flawless understanding of space, lighting, and materials.
- Artistic Flair: It dominates in stylized categories like Anime & Cartoon Style and Ghibli style, proving it understands specific art direction extremely well.
- Text & Logic Struggles: Performance drops sharply in the Ultra Hard category (average 5.1), often failing to render specific text strings correctly or handle complex physical logic.
- "Plastic" Realism: While composition is good, photorealistic portraits often suffer from aggressive skin smoothing, giving subjects an artificial look.
Deep Dive: Patterns & Quality
1. Artistic Versatility vs. Photorealism
Seedream 3.0 operates best when allowed to be artistic. It achieved perfect or near-perfect scores in prompts requiring specific art styles, such as the Child and elder tending oversized vegetables (Score: 10/10) in the Ghibli style, and the Isometric architectural illustration (Score: 10/10). However, in Photorealistic People & Portraits, it tends to over-process images. For example, the Toddler with curly hair scored only a 5 due to "unnatural glow and smoothness," a common artifact where the model prioritizes a "clean" look over organic texture.
2. Text Generation Capabilities
The model's ability to render text is inconsistent. It handles simple, short words well, such as the Neon Open 24/7 sign (Score: 9/10). However, it struggles with longer phrases or specific branding constraints. In the Graphic Design category, the Tech startup logo failed significantly (Score: 3/10) because it hallucinated the text as "Quantrum Leave" instead of "Quantum Leap."
3. Logic and Complexity
While it handles visual complexity well (e.g., lighting in a cathedral), it struggles with semantic complexity. In the Ultra Hard category, prompts that require contradicting standard training data often fail. A prime example is the Astronaut ridden by a horse (Score: 3/10), where the model failed to grasp the reversed physical interaction requested.