AIML Special Presentation: Active Perception and Reasoning in Open Worlds

Abstract: Building intelligent systems that can perceive, reason, and act in open worlds remains a grand challenge. In this talk, I will share our journey toward active perception—from unifying 2D vision-language understanding to structured 3D reasoning and high-level foresight.

We begin with 2D perception, rethinking visual tokenization and cognitive reasoning in multimodal models to move beyond recognition toward interpretable understanding. Progressing into 3D, we explore how agents can perceive and reason about spatial structures in the physical world, grounding language in geometry and learning through self-driven curiosity. Finally, we advance toward high-level imagination and foresight, empowering models to infer unseen structures, anticipate future events, and reason causally about dynamic environments.

Together, these efforts bridge perception, reasoning, and imagination—paving the way toward intelligent agents capable of understanding and interacting with complex, ever-changing real-world environments.

Tagged in Artificial Intelligence, 3D