Discover our
resources

Activity: Q-learning or programming with Pytorch

Materials required:

  • 1 robot minimum or software simulation
  • 1 computer/robot
  • Small arena minimum

Software configuration :

  • example configuration: manual editing: "blocked vs. movement".

Duration :

2 to 3 hours

Age :

+ 15 years

The advantages of this activity : 

  • Can be performed (in part) with the simulator.

Program the reward function that the robot receives to perform an original learning task: follow a line with the camera, ... or something else!

Programming the neural network

Work with the simulated robot at first, then switch back to the real thing at the end. Switch off the robot to save its batteries.

In the AI tab, select Algorithm: "Student code", then answer "New" to the question and choose the name of the file to save your code in.

You're going to edit the code of 3 functions "init", "learn" and "take_decision" to code your own neural network.

Understanding the 3 functions

  1. Add prints to each of the 3 functions and note that 
  • init is called when "Reset AI" is pressed
  • learn is called up as often as possible (but only if the "Learn" button is activated and there is learning data in the experience memory)
  • take_decision is called each time the program wants to know a decision (it is in fact called 2 times per step: 1 time to display the values on the screen, 1 time to make the robot take the decision
  1. Modify the value returned by the take_decision function: this determines the action chosen by the robot in autonomous mode.

Coding a first network "outside" the software

Here's a code that builds a neural network with 1 input, 2 outputs (corresponding to 2 categories), and an intermediate layer of 100 neurons.

He trains it on a set of 3 data (x=-2 🡪 category 1), (x=0 🡪 category 0), (x=2 🡪 category 1) and checks that the error is small at the end of the training.

    import torch as th

    from torch import nn

    import numpy as np

    ninput = 1

    noutput = 2

    # Example data set

    data = [[-2], [0], [2]]

    classes = [1, 0, 1]

    data = th.Tensor(data)

    classes = th.LongTensor(classes)

    # Init neural network

    network = nn.Sequential(

            nn.Linear(ninput, 100),

            nn.LeakyReLU(),

            nn.Linear(100, noutput),

    )    

    print("first layer: weights = ", network[0].weight)

    print("first layer: bias = ", network[0].bias)

    # Test network output before learning

    sensors = th.Tensor([[0], [1], [2]])

    output = network(sensors)

    print(output)

    # Init loss function

    lossfn = nn.CrossEntropyLoss()

    pred = network(data)

    print("loss:", lossfn(pred, classes))

    # Init optimizer

    optimizer = th.optim.SGD(network.parameters(), lr=1e-1)

    # Learning

    nrepeat = 2000

    for i in range(nrepeat):

        optimizer.zero_grad()

        pred = network(data)

        loss = lossfn(pred, classes)

        print("loss:", lossfn(pred, classes))

        loss.backward() # this computes the gradient, i.e. the derivatives

        optimizer.step()

    # Test network output after learning

    pred = network(data)

    print("loss after", nrepeat, "learning steps:", lossfn(pred, classes))

    print("first layer: weights = ", network[0].weight)

    print("first layer: bias = ", network[0].bias)

  1. Copy the following code at the end of the Python file and test it by pressing the "Reset AI" button (you can see the print displays in the console).
  2. Now you will fill in the 3 functions init, learn and take_decision :

At the top of the file, define the following variables, which will be accessible in the various functions

network = lossfn = optimizer = None

Within each of the 3 functions, make these variables accessible with the line :

global network, lossfn, optimizer

Move the "network initialization" part to the init function

Move the "learning" part into the learn function (but remove the for loop: put in only one learning step, knowing that the function will be called a large number of times).

Write the take_decision function, which must return the category number corresponding to the "sensors" input, i.e. 0 or 1.

  1. Test your program in the software, with the simulated robot, and then with the real robot!

‍Assessmentand feedback

We hope this session has helped you rediscover neural networks in a concrete way!

In programming, the list of accessible program functions is called an API, for Application Programming Interface. It is used by programmers to find out how to interact with the program. You'll find the API for the alphai python module here: https: //drive.google.com/file/d/1C4ovPW_eH5KFz5Y9JvSrzLhtmdOpcp6-/view