Discover our
resources

Activity: Reinforcement learning

Equipment:

  • 1 robot minimum
  • 1 computer/robot
  • Flat environment

Settings :

  • Supervised learning - Obstacle avoidance (complex)

Duration: 1h30 (2*45 min)

Age: 8+ years

Recommended for hands-on learning

Teach your robot to react to specific situations!

This activity will teach you how to set up the AlphAI software and its neural network to run mBot in a simple environment.

Hardware

To complete this activity, you'll need to assemble a small square arena:

We recommend you use our individual arena, available on our website or from our distributors.

You can also create your own arena. You'll need a clean, flat surface (e.g. a table), surrounded by barriers of a different color from the ground, and strong enough to stop the robot.

Configuration

The software can be configured either manually or automatically:Settings > Load example settings > Reinforcement learning - Obstacle avoidanceThe software can also be configured manually by following the instructions opposite.

  • Sensors > Ultrasonic, Motion detection, last action performed

  • Actions > Forward, Turn, Reverse while turning
  • Reward > "Obstacle avoidance
  • AI > Learning type "Reinforcement learning", Algorithm "deep Q-Learning", Intermediate neural layers: 300 100 50.

  • Visualization > Select: "neural network", "connections", "synaptic activity".

Concept

There is no training phase for this activity.

In reinforcement learning, the robot teaches itself by trial and error. Simply press the Autonomous button to start the activity and let the robot train itself.

Once you've pressed the button, all you have to do is watch the robot and let it train itself.

You can also see that the neural network is much larger, with several intermediate layers. Reinforcement AI is a much more complex form of AI, involving many more calculations than those used for supervised learning.

‍Theaim of this activity is to understand :

  • How the robot makes a decision.
  • How his decision evolves as he trains.
  • How the robot avoids obstacles and moves around the arena without touching the walls.

  1. Understanding reward systems

The big difference between supervised learning and reinforcement learning lies in the reward system. As you can see, there are now these two blocks "Reward" and "Level" at the bottom of the screen. But what do they correspond to?

Each action is assigned a reward ranging from -100 to 100. The level corresponds to the average of all the rewards.

Advance = +100

Turn right/left = +55

Turning back = -50

When the wheels lock, the robot takes a -50 reward.

Just like us humans, the robot likes to receive positive rewards and dislikes negative ones.

The robot will therefore try out the various actions it can perform, in a totally random fashion at first, but will soon realize that certain actions bring it greater rewards than others, and it will seek to maximize and then optimize them. The level represents the average of all accumulated rewards, and gives a good indication of the robot's overall level, i.e. its ability to move around the arena avoiding walls. By simulating a large number of trials, the level peaks at around 80-90. At this level, the robot receives almost no negative reward at all, and the average only increases over time.

The trick when programming artificial intelligence by reinforcement is to apply the highest rewards to the actions you want it to master.

For example, if you wanted an intelligence to learn to park in a particular space, you'd program it to apply an increasingly positive reward as it got closer to the space, and an increasingly negative one as it moved further away.

The reward system is the learning method that most closely resembles our own. In fact, our way of teaching also works with a reward system. To help students learn, we invented the grading system; we reward students who have learned their lesson well, and punish the others.

Related courses