Discover our
resources
Materials required:
- 1 robot minimum
- 1 computer/robot
- Individual arena
Software configuration :
- example configuration: "Blocked VS Movement" in manual editing mode, then in "reinforcement learning" mode
Duration :
90 minutes
Age :
15+ years
The advantages of this activity :
- Understanding the Q-learning algorithm
- Multidisciplinary (math/computer science)
- Can be performed with the simulator
The aim of this activity is to understand how the Q-learning algorithm works.
Q-learning is a reinforcement learning algorithm. The robot learns by trial and error, based on rewards determined by the user (as in the "Obstacle avoidance" scenario).
Using this method, we're going to train an AlphAI robot to move around an arena without getting stuck against the walls. This simple task will give us a good understanding of the mechanisms involved in the neural network.
[Video content coming soon]
Installation
Place the robot in a small, unobstructed arena.
Reward and manual editing
In the first part, we discover the principle of rewards, and use the "manual editing" mode to find the robot behavior that maximizes the rewards received.
Reinforcement learning
In the second part, we observe the robot's step-by-step learning process and the evolution of connection weights in the network. We discover the importance ofexploration in learning.
Q-learning
The next step is to discover the Q-learning algorithm itself. It mainly consists of a formula for updating the connection weights after each robot trial. This formula involves two parameters whose usefulness is debated.
We are discovering the impact of temporality on learning: we need to learn fast enough without "jumping to conclusions". The algorithm must also be able to take future rewards into account, and not just the immediate reward: it must develop a longer-term vision.
Deep Q-learning
Deep Q-learning is an evolution of the Q-learning algorithm to apply it to more complex (multi-layered) neural networks. This is the algorithm used in the "Obstacle avoidance" scenario, for example.