Three-dimensional content generation has progressed from producing isolated, visually plausible shapes to constructing structured assets that can be deployed in real-time interactive environments. This trajectory is driven by converging demands from game development, embodied AI, world simulation, digital twins, and spatial computing, all of which require 3D content that goes beyond surface appearance to satisfy engine-level constraints on topology, UV parameterization, physically based materials, skeletal rigging, and physics-aware scene layout. Despite rapid advances in generative modeling, a persistent gap separates the outputs of current methods from the production-ready standard expected by interactive applications. This survey addresses that gap by organizing the literature around the asset production pipeline rather than algorithmic families. Along the horizontal axis we distinguish three asset tiers—namely general objects, characters, and scenes—while the vertical axis traces each tier through the full production lifecycle from data foundations and geometry synthesis through topology optimization, UV unwrapping, PBR appearance, rigging, and scene assembly. Through this two-dimensional taxonomy we assess not only what current methods can generate but whether their outputs are directly usable in downstream engines and simulation platforms. We further consolidate evaluation metrics and protocols that span geometric fidelity, appearance quality, asset usability, and scene-level physical plausibility. The survey concludes by identifying open challenges in data quality, generation controllability, end-to-end assetization, and physically grounded generation, and by situating production-ready 3D content as foundational infrastructure for emerging interactive world models and embodied intelligent systems.
General Objects follow a geometry–topology–appearance pipeline. Methods reconstruct geometry from single-view or multi-view input, refine topology into manifold quad-dominant meshes, then generate UV maps and physically based texture for engine-ready assets.
Characters & Avatars follow a body–head–rigging pipeline. Methods generate articulated human bodies and expressive heads, align geometry with parametric priors, and assign skeletal structure with skinning weights to produce animation-ready assets with stable motion and facial control.
Scenes & Environments follow a layout–grounding–world-scale pipeline. Methods first plan semantic layout, then ground assets with physically plausible placement, and finally scale to large environments with stable world composition.
Watertight non self intersecting meshes with clean manifold edges for Boolean editing physics simulation and direct engine import.
Clean UV atlases with low distortion and no overlap for accurate texture mapping and physically based material baking.
Independent albedo roughness metallic and normal maps for robust relighting in real time renderers.
Structured skeletons with LBS compatible skinning weights for mocap animation and procedural motion control.
Assets remain editable in standard DCC tools such as Blender Maya and 3ds Max without full regeneration.
Collision meshes mass properties and material tags support rigid body and soft body simulation in game engines.
LGM= Large Generation Model FF= Feed Forward Topo= Topology Aware SDS= Score Distillation Gen= Generative FFrec= Feed Forward Reconstruction Lay= Layout Gnd= Grounding Wld= World Scale
A rotating Hunyuan3D 3.0 GIF preview is available below, and additional representative method viewers are in preparation.
If you find this survey useful in your research, please consider citing:
@misc{wu2026visualsynthesisinteractiveworlds,
title = {From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation},
author = {Jiafeng Wu and Zhuofan Lou and Jian Liu and Dazhao Du and Chunchao Guo and Song Guo},
year = {2026},
eprint = {2604.23629},
archivePrefix = {arXiv},
primaryClass = {cs.GR},
url = {https://arxiv.org/abs/2604.23629}
}