Activity: Blocked VS Movement

Materials required:

1 robot minimum
1 computer/robot
Individual arena

‍

Software configuration :

example configuration: "Blocked VS Movement" in manual editing mode, then in "reinforcement learning" mode

Duration :

90 minutes

Age :

15+ years

‍

The advantages of this activity :

Understanding the Q-learning algorithm
Multidisciplinary (math/computer science)
Can be performed with the simulator

‍

The aim of this activity is to understand how the Q-learning algorithm works.

‍

Q-learning is a reinforcement learning algorithm. The robot learns by trial and error, based on rewards determined by the user (as in the "Obstacle avoidance" scenario).

Using this method, we're going to train an AlphAI robot to move around an arena without getting stuck against the walls. This simple task will give us a good understanding of the mechanisms involved in the neural network.

‍

[Video content coming soon]

‍

Installation

Place the robot in a small, unobstructed arena.

‍

Reward and manual editing

In the first part, we discover the principle of rewards, and use the "manual editing" mode to find the robot behavior that maximizes the rewards received.

Reinforcement learning

In the second part, we observe the robot's step-by-step learning process and the evolution of connection weights in the network. We discover the importance ofexploration in learning.

Q-learning

The next step is to discover the Q-learning algorithm itself. It mainly consists of a formula for updating the connection weights after each robot trial. This formula involves two parameters whose usefulness is debated.

We are discovering the impact of temporality on learning: we need to learn fast enough without "jumping to conclusions". The algorithm must also be able to take future rewards into account, and not just the immediate reward: it must develop a longer-term vision.

Deep Q-learning

Deep Q-learning is an evolution of the Q-learning algorithm to apply it to more complex (multi-layered) neural networks. This is the algorithm used in the "Obstacle avoidance" scenario, for example.

Download

Related courses

Your basket

Discover our
resources

Installation

Reward and manual editing

Reinforcement learning

Q-learning

Deep Q-learning