Course syllabus
Teachers
Laes Hammarstrand (examiner) and Erik Landolsi
Teaching assistants:
David Hagerman Olzon, Mahandokht Rafidashti, Bernardo Taveira, David Nordström, Yaroslava Lochman, Roman Naeem, Sophia Staudt, Josef Bengtson and Richard Petersen,
Course-specific prerequisites
Students should have a working knowledge of basic probability, linear algebra, and programming. It is desirable to have basic knowledge in statistics and learning, corresponding to, for example, ESS101 - Modelling and Simulation, SSY230 - System Identification, or TDA231 - Algorithms for Machine Learning and Inference, but it is not a strict requirement.
Aim
The purpose of this course is to give a thorough introduction to deep machine learning, also known as deep learning or deep neural networks. Over the last few years, deep machine learning has dramatically changed the state-of-the-art performance in various fields, including speech recognition, computer vision, and translation. We focus primarily on basic principles regarding how deep networks are constructed and trained, but we also cover many of the key techniques used in different applications. The overall objective is to provide a solid understanding of how and why deep machine learning is useful, as well as the skills to apply them to solve problems of practical importance.
Learning outcomes
After the course, students should be able to:
- explain the fundamental principles of supervised learning, including strategies to use validation data to avoid overfitting,
- describe the standard cost functions optimised during supervised training (in particular, the cross entropy) and the standard solution techniques (stochastic gradient descent, back propagation, etc),
- explain how traditional feed-forward networks are constructed and why they can approximate “almost” any function (the universality theorem),
- understand the problem with vanishing gradients and modern tools to mitigate it (e.g., batch normalisation and residual networks),
- summarise the key components in a convolutional neural networks (CNNs) and their key advantages,
- describe common types of recurrent neural networks (RNN) and their applications,
- summarise how transformers are constructed and describe their key properties,
- provide an overview of some of the many modern variations of the deep learning networks,
- argue for the benefits of transfer learning, self-supervised learning, and semi-supervised learning when we have a limited amount of annotated/labeled data,
- train and apply CNNs to image applications and RNNs or transformers to applications related to time sequences,
- use a suitable deep learning library (primarily PyTorch) to solve a variety of practical applications.
Content
The content of the course includes:
- supervised learning by cross-entropy minimization combined with evaluations on validation data,
- backpropagation and stochastic gradient descent,
- a suitable programming language for implementing deep learning algorithms,
- feedforward neural networks and convolutional neural networks,
- recurrent neural networks,
- the transformer architecture,
- techniques for efficient training, such as momentum and batch normalization,
- modern variations of neural networks (e.g., attention and residual networks),
- self-supervised learning and semi-supervised learning,
- application of convolutional neural networks on image recognition and transformers for sequential problems.
Organisation
The course comprises:
- One introductory lecture and two later lectures on sparse transformers and semi-supervised learning (all three in a conventional lecture format).
- Seven online lectures (to watch before the corresponding practice session).
- Seven practice sessions, where we review the material from the corresponding video lectures.
- One guest lecture from the industry.
- 4 home assignments (2 individual and 2 in groups of two students): you first hand in the assignment and then perform an individual closed-book exam of the assignment using Inspera.
- An online crash course in Python programming.
- 1 online quiz
- 2 computer labs
- 1 project (including both a report and an oral presentation)
- Consultation hours (mostly related to the home assignments and projects)
Literature
We use Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning, MIT Press, 2016, which is available online http://www.deeplearningbook.org, and supporting documents provided in the course (e.g., lecture slides).
Examination and grades
The students are evaluated individually based on their performance in the various activities throughout the course; more specifically, the grade is determined by weighing the results of in-class work, projects, and the degree of attendance. More details are available at Organisation, examination and grades.
Course summary:
Date | Details | Due |
---|---|---|