MapExRL: Human-Inspired Indoor Exploration with Predicted Environment Context and Reinforcement Learning

Accepted to the IEEE International Conference on Robotics and Automation (ICRA) 2025 Workshop

Submitted to the IEEE International Conference on Advanced Robotics (ICAR 2025), under peer review.


1Brown University
2Carnegie Mellon University Robotics Institute
3University of Michigan, Ann Arbor
* Equal Contributions
MapExRL Overview

MapExRL is a learning-based exploration policy that leverages human-inspired strategies, global map predictions, and environmental context for efficient autonomous exploration. A human user study informs the policy design, enabling state-of-the-art performance through context-aware decision-making.

Abstract

Path planning for robotic exploration is challenging, requiring reasoning over unknown spaces and anticipating future observations. Efficient exploration requires selecting budget-constrained paths that maximize information gain. Despite advances in autonomous exploration, existing algorithms still fall short of human performance, particularly in structured environments where predictive cues exist but are underutilized. Guided by insights from our user study, we introduce MapExRL, which improves robot exploration efficiency in structured indoor environments by enabling longer-horizon planning through a learned policy and global map predictions. Unlike many learning-based exploration methods that use motion primitives as the action space, our approach leverages frontiers for more efficient model learning and longer horizon reasoning. Our framework generates global map predictions from the observed map, which our policy utilizes, along with the prediction uncertainty, estimated sensor coverage, frontier distance, and remaining distance budget, to assess the strategic long-term value of frontiers. By leveraging multiple frontier scoring methods and additional context, our policy makes more informed decisions at each stage of the exploration. We evaluate our framework on a real-world indoor map dataset, achieving up to an 18.8% improvement over the strongest state-of-the-art baseline, with even greater gains compared to conventional frontier-based algorithms.

Human-Inspired Exploration: Understanding How People Navigate the Unknown

To design an exploration policy that mirrors human-level decision-making, we first needed to understand how people explore unknown environments. Our motivation was to uncover the long-term strategies, contextual cues, and prioritization methods that humans intuitively use—elements often missing from existing robotic systems.

We conducted a user study where 13 participants of varying robotics experience were tasked with selecting frontiers to explore based on partial map observations and global map predictions. Participants navigated through three different building layouts, attempting to maximize their understanding of the environment within a fixed exploration budget.

Below is a video of one of the participants performing the task.

Insights from the User Study

From the study, we observed that high-performing participants did not simply maximize map coverage—they strategically prioritized exploring uncertain regions in the predicted map, aimed to minimize backtracking, and adapted their strategies based on the map's scale and structure. These behaviors highlighted key decision-making traits like budget awareness, long-horizon planning, and context-driven action selection.

We translated these insights into the design of our RL policy by:

  • Incorporating global map predictions to prioritize uncertain and informative areas,
  • Balancing exploration and exploitation via multiple frontier scoring metrics,
  • Adapting to varying map scales and layouts,
  • Embedding budget-awareness directly into the observation space,
  • And using frontiers instead of motion primitives to enable longer-horizon, strategic decision-making.

These human-inspired elements allowed our policy to outperform existing methods, especially in complex and large-scale environments.

MapExRL Pipeline Overview

MapExRL Pipeline Overview

Observed maps are processed through three independent global map prediction models, generating prediction maps. These maps are averaged and passed through a convolutional encoder to extract a 256-dimensional feature vector. This vector is concatenated with frontier centers, prediction and utility scores, distances from the agent, and remaining budget. The resulting vector is fed into a fully connected network that outputs N values, and the argmax is selected as the index of the frontier action.

Results

Results Table

Table II. Summary of results reported in the paper. Average and 95% confidence interval for reward and IoU on the test maps. Each map includes 15 experiments from different starting positions, totaling 75 experiments across 5 maps. For detailed methodology, analysis, and information about the maps and baseline methods, refer to the full paper.

Video

BibTeX


        @article{harutyunyan2025mapexrl,
          title={MapExRL: Human-Inspired Indoor Exploration with Predicted Environment Context and Reinforcement Learning},
          author={Harutyunyan, Narek and Moon, Brady and Kim, Seungchan and Ho, Cherie and Hung, Adam and Scherer, Sebastian},
          journal={arXiv preprint arXiv:2503.01548},
          year={2025}
        }