Summary for Architecture & Interiors
The evaluation reveals that while AI models have largely mastered photorealistic interior design, technical architectural illustration remains a significant hurdle due to text hallucinations.
Key Findings
- Top Tier Performance: Flux 2 Pro, Seedream 4.0, and Imagen 4.0 Ultra emerged as the most consistent performers, delivering exceptional photorealism and handling complex structural logic with few errors.
- The 'Text' Trap: A major trend was the failure of otherwise powerful models (like DALL-E 3 and Flux 1.1 Pro Ultra) on technical drawing prompts. When asked for 'diagrams' or 'cutaways', these models frequently hallucinated gibberish text labels, leading to severe score penalties.
- Photorealism Mastery: Standard interior design prompts (e.g., Scandi Living Room) saw the highest average scores, with most models scoring 8 or above, indicating that generating 'Pinterest-perfect' interiors is now a baseline capability.
- Lighting Mastery: Models excelled at atmospheric rendering, particularly in the Gothic Cathedral and Moroccan Riad prompts, showcasing advanced volumetric lighting capabilities.
Quick Recommendations
Deep Dive: Patterns & Insights
1. The Divide Between Art and Engineering
The data highlights a clear split in model capabilities. Models optimized for artistic composition, such as Midjourney v7, achieved perfect scores on atmospheric prompts like the Moroccan Riad (Score: 10/10) but crumbled when asked for technical precision, scoring poorly on the Machiya Cutaway (Score: 3/10) due to chaotic details and debris. Conversely, models like Reve Image (Halfmoon) demonstrated surprising utility for diagrams, producing clean schematic views with legible or minimal text.
2. Material Rendering & Lighting
Texture handling has advanced significantly. In the Scandi Living Room, models like ChatGPT 4o and Flux 2 Pro rendered distinct textures for wood, fabric, and plant leaves indistinguishable from reality.
- Highlight: Flux 2 Pro achieved a 10/10 on the Modernist Desert Home for flawlessly integrating architecture with natural rock formations and water reflections.
3. Common Failure Modes
- Text Artifacts in Diagrams: In prompts requesting 'cutaways' or 'cross-sections' (Prompt 103, Prompt 106), many models instinctively added annotations. Since models struggle with text, this resulted in distraction gibberish (e.g., "AKSOON GIIIVANE"), forcing score deductions.
- Logic in Transparency: The Glass Skybridge prompt tested physics logic. Some models failed to render the floor as transparent, painting it as opaque reflection instead (Midjourney V6.1), while others like Seedream 4.0 correctly rendered the city view through the floor.
4. Stylistic Flexibility
The Ancient Chinese Temple prompt showed that models can effectively switch from photorealism to specific illustrative styles (isometric). DALL-E 3 and Seedream 3.0 excelled here, producing assets that looked like professional game art or architectural studies.
Best Models by Use Case
🏗️ Photorealistic Architecture (Interiors & Exteriors)
Best Models: Flux 2 Pro, Seedream 4.0, Google Imagen 4.0 Ultra
These models offer the highest fidelity for materials and lighting. They are ideal for visualizing real estate listings or architectural concepts where realism is paramount.
📐 Technical Illustration & Diagrams
Best Models: Reve Image (Halfmoon), Recraft V3, Google Imagen 4.0 Ultra
When precise layouts, cutaways, or isometric views are needed without the mess of hallucinated text, these models are superior. They tend to produce cleaner, vector-like lines.
🎨 Atmospheric & Historical Design
Best Models: Midjourney V6.1, Midjourney v7, Flux 2 Pro
For projects requiring mood, historical ambiance, or complex lighting (e.g., stained glass, dappled shade), these models excel at creating an emotional response.
🛸 Sci-Fi & Speculative Architecture
Best Models: Seedream 4.0, Nano Banana Pro
These models handle complex, non-standard geometry (curved walls, futuristic materials) effectively while maintaining a sense of scale.