Learning Robots

Activity: Q-learning or programming with Pytorch

Materials required:

1 robot minimum or software simulation
1 computer/robot
Small arena minimum

‍

Software configuration :

example configuration: manual editing: "blocked vs. movement".

‍

Duration :

2 to 3 hours

Age :

+ 15 years

‍

The advantages of this activity :

Can be performed (in part) with the simulator.

‍

Program the reward function that the robot receives to perform an original learning task: follow a line with the camera, ... or something else!

‍

Programming the neural network

Work with the simulated robot at first, then switch back to the real thing at the end. Switch off the robot to save its batteries.

In the AI tab, select Algorithm: "Student code", then answer "New" to the question and choose the name of the file to save your code in.

You're going to edit the code of 3 functions "init", "learn" and "take_decision" to code your own neural network.

Understanding the 3 functions

Add prints to each of the 3 functions and note that

init is called when "Reset AI" is pressed
learn is called up as often as possible (but only if the "Learn" button is activated and there is learning data in the experience memory)
take_decision is called each time the program wants to know a decision (it is in fact called 2 times per step: 1 time to display the values on the screen, 1 time to make the robot take the decision

Modify the value returned by the take_decision function: this determines the action chosen by the robot in autonomous mode.

‍

Coding a first network "outside" the software

Here's a code that builds a neural network with 1 input, 2 outputs (corresponding to 2 categories), and an intermediate layer of 100 neurons.

He trains it on a set of 3 data (x=-2 🡪 category 1), (x=0 🡪 category 0), (x=2 🡪 category 1) and checks that the error is small at the end of the training.

import torch as th

from torch import nn

import numpy as np

ninput = 1

noutput = 2

# Example data set

data = [[-2], [0], [2]]

classes = [1, 0, 1]

data = th.Tensor(data)

classes = th.LongTensor(classes)

# Init neural network

network = nn.Sequential(

nn.Linear(ninput, 100),

nn.LeakyReLU(),

nn.Linear(100, noutput),

)

print("first layer: weights = ", network[0].weight)

print("first layer: bias = ", network[0].bias)

# Test network output before learning

sensors = th.Tensor([[0], [1], [2]])

output = network(sensors)

print(output)

# Init loss function

lossfn = nn.CrossEntropyLoss()

pred = network(data)

print("loss:", lossfn(pred, classes))

# Init optimizer

optimizer = th.optim.SGD(network.parameters(), lr=1e-1)

# Learning

nrepeat = 2000

for i in range(nrepeat):

optimizer.zero_grad()

pred = network(data)

loss = lossfn(pred, classes)

print("loss:", lossfn(pred, classes))

loss.backward() # this computes the gradient, i.e. the derivatives

optimizer.step()

# Test network output after learning

pred = network(data)

print("loss after", nrepeat, "learning steps:", lossfn(pred, classes))

print("first layer: weights = ", network[0].weight)

print("first layer: bias = ", network[0].bias)

‍

Copy the following code at the end of the Python file and test it by pressing the "Reset AI" button (you can see the print displays in the console).
Now you will fill in the 3 functions init, learn and take_decision :

At the top of the file, define the following variables, which will be accessible in the various functions

network = lossfn = optimizer = None

Within each of the 3 functions, make these variables accessible with the line :

global network, lossfn, optimizer

Move the "network initialization" part to the init function

Move the "learning" part into the learn function (but remove the for loop: put in only one learning step, knowing that the function will be called a large number of times).

Write the take_decision function, which must return the category number corresponding to the "sensors" input, i.e. 0 or 1.

Test your program in the software, with the simulated robot, and then with the real robot!

‍

‍Assessmentand feedback

We hope this session has helped you rediscover neural networks in a concrete way!

‍

In programming, the list of accessible program functions is called an API, for Application Programming Interface. It is used by programmers to find out how to interact with the program. You'll find the API for the alphai python module here: https: //drive.google.com/file/d/1C4ovPW_eH5KFz5Y9JvSrzLhtmdOpcp6-/view

Download

TP - Q-Learning Programming (FR)Lab - Q-Learning Programming (EN)

Related courses

Your basket

Discover our
resources

Programming the neural network

‍

‍Assessmentand feedback