These videos show agent teams that were evolved by multiagent HyperNEAT or SARSA for the EIJ submission, Scalable Multiagent Learning through Indirect Encoding of Policy Geometry. In the videos teams of predators (blue squares) work together to capture prey agents (green squares) that run away from them. They must work together because a prey is faster than a predator and will always run in the opposite direction of the closest predator. Predators cannot communicate with or see each other, so they must learn complentary roles a priori. Note that unlike previous multiagent HyperNEAT scaling experiments, the difficultly of the problem (i.e. number of prey) increases as the teams grow in size.
SARSA: 16 Agents
SARSA is able to find solutions on the line formation for up to 16 agents. This video shows a typical strategy used by SARSA at this size that involves the agents surrounding the prey.
Multiagent HyperNEAT: 16 Agents
Multiagent HyperNEAT discovers a more efficient strategy for size 16, wherein the prey are split in half and captured seperately.
Multiagent HyperNEAT: 64 Agents
This trend for HyperNEAT continues at larger sizes, but the prey get split into more groups as team size increases, as in this video with 64 predators. SARSA cannot solve the problem at this size or higher.
Multiagent HyperNEAT: 256 Agents
At the largest trained size, multiagent HyperNEAT discovers a remarkably efficient tactic in which every other agent runs forward to bounce the prey between them.
Note: Due to the unusual aspect ratio and resolution of this video, only a direct AVI download is available.
Multiagent HyperNEAT: Post- Training Scaling from 16 up to 64 Agents
Multiagent HyperNEAT is not only scalable in terms of the number of agents it can train, but it can also automatically scale existing teams up in size without further training. This video shows such scaling for a team trained at 16 agents, up to 64 agents. SARSA was not able to scale in any cases except starting from 2 agents (up to at most 16 agents from 2).
Multiagent HyperNEAT: 32 Agents on the L
Teams were also trained on a different, more difficult formation called the L. In this video, 32 agents solve this task by tackling each branch of the L seperately. SARSA was only able to solve the L at the smallest size of 4.
Situational Policy Geometry
This video shows agent teams that were evolved in simulation with multiagent HyperNEAT. In the video teams of Khepera III robots work together to patrol an environment made of bricks and then return on command. They must work together so that they each cover different areas, even though they cannot communicate with each other. The teams are trained with an extension of multiagent HyperNEAT called situational policy geometry, which allows the robots to learn to switch between multiple brains to perform different tasks depending on their state. In this case, the robots use different brains depending on a signal that tells them to come home or to continue patrolling. The bottom of the video shows the current state of a GUI interface for communicating with the robots. The teams are run in two environments, the plus, in which they were trained, and the asymmetric plus, where they were not trained (which tests generalization):
Plus (Training Environment) and Asymmetric Plus (Testing Environment)