Enabling Intelligent Heterogeneous Sensor Teams to Find Hard Targets with Reinforcement Learning
Reinforcement learning (RL) is known for its well-publicized success in defeating humans at real-time strategy games like Dota 2 and Starcraft II. In these games, RL agents learn to optimally distribute teammate responsibility and utilize available resources to meet a common goal. These same attributes are required by autonomous sensor teams on reconnaissance missions. Centurion has demonstrated that heterogeneous RL agents can learn powerful strategies for taking real-time, coordinated actions to achieve high-level, mission goals. Stratagem leverages RL to develop intelligent remote sensing agents that collaborate to find hard targets, such as injured people or mobile threats.
Centurion employs the Advanced Framework for Simulation, Integration, and Modeling (AFSIM) as powerful virtual environment for remote sensing agents to explore and learn. The AFSIM baseline is a rich mission-level simulation tool that provides sensor models for a variety of domains including space, air, surface, and cyber.
Centurion Offers:
- Intelligent Decision-Making: Cognitive agents trained with RL are dynamic enough to make real-time decisions based on the current state of the environment. At the same time, agents perform long-term planning such as learning collaborative strategies to leverage the strengths of each individual.
- AFSIM Interfaces: The Centurion team has expanded upon the existing AFSIM interfaces to improve RL agent access to existing AFSIM sensor and processor models. To facilitate training agents, Stratagem has gained expertise in building custom scenarios, reward functions, and observation structures within the AFSIM simulation environment.
- Cutting-Edge Automation: Stratagem uses the latest advancements in hyperparameter optimization (HPO) and automatic reward shaping to streamline the training process and maximize the performance of the resulting model. We applied state-of-the-art HPO techniques developed for supervised learning to multi-agent RL which lead to significantly improved agents. Our research paper on this topic is currently in review for release and publication.
- Interpretable RL: Stratagem has developed a suite of intuitive visualization tools to look inside the black box of RL. These include new ways to represent the episode, heatmaps showing frequent agent locations, and plotting internal agent metrics. These graphics improve understanding of agent decision processes to enable interpretability and boost user confidence.
- High Performance Computing: At its core, Centurion runs as a distributed application. This allows for accelerated training through massive parallelization on the DoD’s high-performance computing resources. This capability has enabled faster development and prototyping as the Centurion team explored increasingly more complex AFSIM scenarios.