This is an elective course designed for the Master of Autonomous Systems program at Hochschule Bonn-Rhein-Sieg. I taught this course for the first time in the winter semester 2023/24.

The objectives of the course are as follows:

- Introducing the predominant conceptual ideas behind robot learning.
- Illustrating a variety of techniques applied in robot learning, such as simulation-based learning, safe learning, learning in social, human-centred settings, as well as explainability of learning-based robots.
- Exposing students to robot learning from a practical perspective through hands-on activities.
- Enabling students to critically evaluate the developments in the robot learning literature.

- Robot learning introduction
- Learning-based robot manipulation
- Learning-based robot navigation
- Learning from demonstration
- Visuomotor robot policies
- Learning-based object grasping
- Sim-to-real transfer
- Safe robot learning
- Inverse reinforcement learning
- Spatial relation learning
- Relational learning
- Robot personalisation
- Language-based learning
- Explainable robotics

In this assignment, you will write your own simple implementation of dynamic motion primitives for learning three-dimensional paths. In particular, follow the description of DMPs with a discrete canonical system to write your own implementation that allows you to learn a three-dimensional DMP from a given demonstration and then generate paths based on the learned information. Then, verify that your implementation is working correctly and investigate the factors that affect the reconstruction error.

For the evaluation, generate multiple mathematical curves and use those to represent demonstrations from which you can learn the DMP weights. The evaluation should consist of two parts:

- Plot your results (the generated and the reconstructed curves, similar to what is shown on lecture slide 16) to qualitatively evaluate the reconstruction.
- Use a similarity metric to evaluate the curve similarity quantitatively (e.g. mean squared error).

In your evaluation, investigate how changing the number of basis functions affects the error.

An optional task is to extend your implementation so that you can also learn rhythmic DMPs.

In this assignment, you will develop a simple visuomotor policy for solving the simple cart-pole problem. For this, you will use the gymnasium (which defines the cart-pole environment) as well as PyTorch for implementing and training neural network models.

- First, implement a class that defines a policy and/or value networks (depending on the reinforcement learning algorithm that you want to implement) and allows you to sample actions from the learned policy as well as perform network updates based on experiences. Your network should be defined so that the input is an image of the cart-pole system and the action space is discrete - move left or move right. Note: If it helps, you are free to incorporate existing implementations of reinforcement learning algorithms, for instance as provided in Stable Baselines3, in your solution.
- After your network is defined, implement the reinforcement learning loop for your agent. This means that you need to collect experiences of the form $(s_t, a_t, s_{t+1}, r)$ so that you can update your policy network appropriately. How exactly you do the update will depend on the RL algorithm you use. Plot the evolution of the return over the learning process to show that your agent is actually learning. Note, however, that, as reinforcement learning algorithms have randomness associated with them, the results will differ every time you execute the algorithm; thus, you should plot an average of the return (over multiple runs) instead of the return of a single run — like on the plots shown here.
- Finally, modify your implementation of the policy network so that it takes both the image and the explicit state information of the system as separate inputs (thus turning the policy into a multimodal policy). Then, update the learning loop accordingly and verify that learning is indeed taking place. Has the second modality changed the behaviour of the agent? Discuss the observations from your evaluation.

This assignment is essentially a continuation of the previous one, such that the objective is to apply **model-based learning** (a) for speeding up the visuomotor policy learning process and (b) so that we end up with a predictive model as a result of the learning process. You are free to choose a model-based learning algorithm (you can take a look at this survey); my suggestion would be Dyna as it is rather easy to implement. We particularly want to learn a transition model that enables predicting the next state given the current state and an applied action.

- First, implement a class that defines policy and/or value networks (depending on the reinforcement learning algorithm that you want to implement) and allows you to sample actions from the learned policy as well as perform network updates based on experiences. As last time,
**your network should be defined so that the input is an image of the cart-pole system and the action space is discrete - move left or move right**. Feel free to reuse the code that you implemented in the previous assignment. - After defining your network, implement the model-based reinforcement learning loop for your agent. As last time, plot the evolution of the return over the learning process to show that your agent is actually learning. In addition, examine your model and check whether what is learned is meaningful (the predictions of the model are images representing the next state, so it should be easy to examine the results of your model visually).
- Modify your implementation of the policy network so that it takes both the image and the explicit state information of the system as separate inputs. Your model will now also need to be updated, as your state information is a combination of multimodal states. Once again, (how) does the learning compare to what you observed last time, when you applied model-free learning? Has the model learned meaningful information?
- The final part of this assignment is to explore the learning process for
**two other gymnasium environments**using a multimodal state representation as above; feel free to pick whichever environments you find interesting. Note, however, that the Mujoco environments are not installed on the JupyterHub, so you can’t use those. For both environments, compare model-free and model-based learning. Can model-free learning be used successfully in both cases or have you observed a case where model-based learning leads to improved learning results?

This assignment is concerned with the spatial relation learning problem, such that we want to learn a model that is able to recognise spatial relations between objects from point cloud data. We will use the Freiburg spatial relations dataset, which includes sample scenes of different spatial relations that can be used for learning a relation classifier. For relation recognition, we want to implement the algorithm discussed in

`B. Rosman and S. Ramamoorthy, "Learning spatial relationships between objects," The International Journal of Robotics Research, vol. 30, no. 11, pp. 1328–1342, 2011.`

Available: https://journals.sagepub.com/doi/abs/10.1177/0278364911408155

which, given segmented point clouds of two objects, learns to recognise relations based on distances between points. Note that you do not have to do any point cloud segmentation yourself; the scenes in the Freiburg spatial relation dataset are already segmented.

Analyse your implementation using any of the train / test splits included with the dataset and discuss your observations. How does your classifier perform? Which relations are easier and which ones are ambiguous?

An essential part of this course is a project during which you will have an opportunity to apply the learning techniques that we discuss in the course to a robotics problem. This can be a problem of your choice or a problem suggested by me; the important things are that the problem is (i) robotics-related and solving it would enable a robot to do something better, (ii) is challenging enough so that it warrants a semester-long work, and (iii) is interesting enough for you so that you are motivated to work on it.

The project will be done either using existing datasets from the literature on robot learning or in a simulation environment. However, if your solution is robot-independent, you can try your algorithm on one of our robots as well.

Before you start working on the project, you will be asked to write a short proposal, namely a short description of what you intend to do in the project and what the expected outcomes would be; this should help you think about the general plan for how to approach your problem and the techniques you plan to use for solving it.

The project will kick off a few weeks after the start of lectures and the project submission date will be in the second examination period of the semester.