MapExRL: Human-Inspired Indoor Exploration with Predicted Environment Context and Reinforcement Learning

22nd International Conference on Advanced Robotics (ICAR) 2025

IEEE International Conference on Robotics and Automation (ICRA) 2025 Workshop

Narek Harutyunyan*¹, Brady Moon*², Seungchan Kim², Cherie Ho², Adam Hung³, Sebastian Scherer²

¹Brown University
²Carnegie Mellon University Robotics Institute
³University of Michigan, Ann Arbor

* Equal Contributions

Paper arXiv Video

MapExRL is a learning-based exploration policy that leverages human-inspired strategies, global map predictions, and environmental context for efficient autonomous exploration. A human user study informs the policy design, enabling state-of-the-art performance through context-aware decision-making.

Human-Inspired Exploration: Understanding How People Navigate the Unknown

To design an exploration policy that mirrors human-level decision-making, we first needed to understand how people explore unknown environments. Our motivation was to uncover the long-term strategies, contextual cues, and prioritization methods that humans intuitively use—elements often missing from existing robotic systems.

We conducted a user study where 13 participants of varying robotics experience were tasked with selecting frontiers to explore based on partial map observations and global map predictions. Participants navigated through three different building layouts, attempting to maximize their understanding of the environment within a fixed exploration budget.

Below is a video of one of the participants performing the task.

Insights from the User Study

From the study, we observed that high-performing participants did not simply maximize map coverage—they strategically prioritized exploring uncertain regions in the predicted map, aimed to minimize backtracking, and adapted their strategies based on the map's scale and structure. These behaviors highlighted key decision-making traits like budget awareness, long-horizon planning, and context-driven action selection.

We translated these insights into the design of our RL policy by:

• Incorporating global map predictions to prioritize uncertain and informative areas,
• Balancing exploration and exploitation via multiple frontier scoring metrics,
• Adapting to varying map scales and layouts,
• Embedding budget-awareness directly into the observation space,
• And using frontiers instead of motion primitives to enable longer-horizon, strategic decision-making.

These human-inspired elements allowed our policy to outperform existing methods, especially in complex and large-scale environments.

MapExRL Pipeline Overview

Observed maps are processed through three independent global map prediction models, generating prediction maps. These maps are averaged and passed through a convolutional encoder to extract a 256-dimensional feature vector. This vector is concatenated with frontier centers, prediction and utility scores, distances from the agent, and remaining budget. The resulting vector is fed into a fully connected network that outputs N values, and the argmax is selected as the index of the frontier action.

Results

Table II. Summary of results reported in the paper. Average and 95% confidence interval for reward and IoU on the test maps. Each map includes 15 experiments from different starting positions, totaling 75 experiments across 5 maps. For detailed methodology, analysis, and information about the maps and baseline methods, refer to the full paper.