Course syllabus
DAT530 / DIT579 Structured Machine Learning lp1 HT23 (7.5 hp)
Course is offered by the department of Computer Science and Engineering
Contact details
Lecturer and examiner: Simon Olsson (simonols@chalmers.se)
Course purpose
The purpose of this course is to give a broad introduction of structured machine learning. Structured machine learning involves using known structure in data to build learning models. Important examples are spatial structure, for example convolutions and attention on images, sequences, and graphs, or time structure such as recurrence. The course focuses on building a strong understanding of the underlying concepts and applying them in a practical setting through project based assignments. The principles taught in the course will be generally applicable, yet the focus will be on applications in the natural sciences (Physics, Chemistry and Biology) where symmetries and structure are often exactly known.
Schedule
Mondays 9.00-11:45 (Lecture) |
Wednesdays 10.00-11:45 (Lecture) |
Wednesdays 13.15-17:00 (Autonomous lab) |
Session | Week | Date | Time | Lecture | Room | Zoom |
1 | 35 | 28-Aug | 9:00-11:45 | Introduction, course overview | SB-L227. SB1 | https://chalmers.zoom.us/j/68838671720? (password: DAT530) |
2 | 35 | 30-Aug | 10:00-11:45 | Data generating processes, high-dimensional data | SB-L227. SB1 | https://chalmers.zoom.us/j/67527797970? (password: DAT530) |
3 | 36 | 04-Sep | 9:00-11:45 | Geometric priors | SB-L227. SB1 | https://chalmers.zoom.us/j/62483528004 (password: DAT530) |
4 | 36 | 06-Sep | 10:00-11:45 | Groups and convolutions | SB-L227. SB1 | https://chalmers.zoom.us/j/63647018720 (password: DAT530) |
5 | 37 | 11-Sep | 9:00-11:45 | Convolutions and Group representations | SB-L227. SB1 | https://chalmers.zoom.us/j/62085212928 (password: DAT530) |
6 | 37 | 13-Sep | 10:00-11:45 | Geodesics and manifolds | SB-L227. SB1 | https://chalmers.zoom.us/j/64274484774 (password: DAT530) |
7 | 38 | 18-Sep | 9:00-11:45 | Graphs and sets | SB-L227. SB1 | https://chalmers.zoom.us/j/67636218534 (password: DAT530) |
8 | 38 | 20-Sep | 10:00-11:45 | Unnormalized distributions and sampling | SB-L227. SB1 | https://chalmers.zoom.us/j/61238976732 (password: DAT530) |
9 | 39 | 25-Sep | 9:00-11:45 | Grids | SB-L227. SB1 | https://chalmers.zoom.us/j/67586085561 (password: DAT530) |
10 | 39 | 27-Sep | 10:00-11:45 | Molecules | SB-L227. SB1 | https://chalmers.zoom.us/j/65152846183 (password: DAT530) |
11 | 40 | 02-Oct | 9:00-11:45 | Large molecules, proteins and representations | SB-L227. SB1 |
https://chalmers.zoom.us/j/64338544230?pwd=ZWRhQmtUQy81OFdOK3pscFVhWmxnZz09 |
12 | 40 | 04-Oct | 10:00-11:45 | Self-study - projects | SB-L227. SB1 | |
13 | 41 | 09-Oct | 9:00-11:45 | Gauges | SB-L227. SB1 | https://chalmers.zoom.us/j/67826556139?pwd=MCtKOHhJLysxNGdmbHkzR1c1Y3p5dz09 |
14 | 41 | 11-Oct | 10:00-11:45 | Dimensions and units | SB-L227. SB1 | https://chalmers.zoom.us/j/65049388818?pwd=ZVlrdVNHM2l6N2JZN2hpSklUYVpWUT09 |
15 | 42 | 16-Oct | 9:00-11:45 | Projects | SB-L227. SB1 | |
16 | 42 | 18-Oct | 10:00-11:45 | Repetition | SB-L227. SB1 | https://chalmers.zoom.us/j/66231499819?pwd=THNUbnhJc0N3TkNiMXZtREdkTVpWQT09 |
Assignment Schedule
Release | Deadline | Final resubmission (to pass course) | Grading | |
Hand-in 1 | 04-Sep | 13-Sep | 23-Oct | Pass/fail |
Hand-in 2 | 11-Sep | 20-Sep | 23-Oct | Pass/fail |
Project 1 | 18-Sep | 28-Sep | 24-Oct | Graded |
Project 2 | 25-Sep | 07-Oct | 24-Oct | Graded |
Project 3 | 02-Oct | 11-Oct | 25-Oct | Graded |
Assay | 04-Oct | 20-Oct | 25-Oct | Pass/fail |
Course literature (to be updated)
Large parts of the course material is loose adaptations from Geometric Deep Learning summer schools.
Primary reference:
Bronstein, Bruna, Cohen, and Veličković: Geometric deep learning. (Free proto-book available: https://arxiv.org/abs/2104.13478)
Selected primary literature and lecture notes TBA
Extra literature:
Serre "Linear Representations of Finite Groups" (1977) https://link.springer.com/book/10.1007/978-1-4684-9458-7
Background references for repetition:
Deisenroth, Faisal, and Ong "Mathematics for Machine Learning" (2021) https://mml-book.github.io/book/mml-book.pdf
Shapira "Linear Algebra and Group Theory for Physicists and Engineers." (2019) https://link.springer.com/book/10.1007/978-3-030-17856-7
Petersen and Pedersen "The Matrix Cookbook" https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf
Git crash course: https://julienpascal.github.io/slides/intro_git/#/
Practical extra information:
37 reasons why you neural network is not working (Blog post)
Course design
The course is based on four components
- Self study: reading and video lectures
- Interactive hybrid lectures
- Individual project work
- Peer-assessment
and follows a blended classroom structure.
Self study:
Before each week reading and video material is provided for preparation. Preparation is key for the success of the interactive hybrid lectures. In these lectures, we will be moving through concepts and problems, and solve them in teams. There will not be repetition of materials given in advance, we assume attendees have read and followed provided material in advance. The first lecture is an exception.
Lectures:
The in-class lectures are interactive and focus on problem solving in teams, interspersed with micro lectures, and discussions (monday and wednesday mornings). The lecture sessions will be held in hybrid from (in person class and zoom remote attendance) however, in person attendance is highly advised.
Project work and assessment:
The course assessment revolves around three individual projects each with equal weight towards the final grade. The projects focus on different aspects of structural learning but are problem-driven. Further first three weeks will have two take home assignments which are pass/fail, both need to be passed to pass the course. Finally, a written project proposal should be submitted and approved (pass/fail) in order to pass the course. More details on the projects and the written proposal will be made available on the course pages in due time.
Peer assessment:
The projects consist of written code and a written report. Before submission, the code should pass all basic unit tests.
Assessment happens in three stages:
1. Peer assessment of code: each course attendee gets assigned to review another course attendees code to make suggestions for improvements.
2. Rebuttal: each course attendee implements changes and rebuts each comment received. The recipient rates the feedback as useful or not useful.
3. Submitted report: Following a successful code review a written report is submitted for grading.
A successful code review means that:
- your code passes unit tests
- you have addressed comments on your code
- you have had your comments on a peers code approved (useful)
We will use Chalmers Gitlab to conduct code submissions and review -- course responsible will oversee the reviewing process.
Deadlines for code submissions are 3 days before deadline for the report. For every day a report is late one point is subtracted until only a passing grade (3) can be achieved.
Course Content:
- Module: What do we know and how can we use it? (3 weeks)
In this module we will broadly cover important background knowledge when working with structured data for example:
- Limits of observations and data generation. Concepts: Data generating processes: spatial and temporal structure. Learning in high-dimensional spaces. Dimensions and scales. Perturbations and interventions.
- Representations and inductive biases. Concepts: Symmetry and conservation (Noether). Geometric priors. Basic group and representation theory. Invariance, covariance, equivariance. Convolutions from first principles. Neural networks as representation learners. Parameter-sharing/tying. Geometric deep learning.
Assessment: Two exercise sets. - Module: Physics – Project: Sampling the Boltzmann distribution (1 week)
Particles in 2 and 3D, potential energies, and simulation. Generative models. - Module: Chemistry – Project: Predicting molecular energies (1 week)
Molecules, molecular representations (strings and graphs), molecular properties. Supervised learning, regression. - Module: Biology – Project: chemical properties (1 week)
Proteins, representations of proteins, molecular evolution. Unsupervised learning. - Module: Bridging the disciplines – Assignment: propose MSc thesis project.
In this module we cover a recent examples of molecular machine learning.
Learning objectives and syllabus
Intended learning outcomes:
After the course the student is expected to be able to
Knowledge and understanding
- Summarize data generation processes as a schematic figure
- Motivate the use of structured machine learning approaches
- Summarize basic concepts of group theory and group representation theory
- List examples of structured machine learning architectures
- Explain the basic principles of structured learning architectures
Skills and abilities
- Conceptualize a machine learning system which uses structure from a data generating process
- Identify symmetries in data.
- Implement machine learning models to approximate structure endowed by a given data generation process
- Use basic concepts from group theory and group representation theory to rationalize machine learning architectures.
- Design a small-scale machine learning research project making use of symmetry and structure in a dataset.
Judgment and approach
- Judge recent scientific reports on machine learning for structured data.
- Appraise small-scale structured machine learning project.
This course is running for the first time.
Examination form
To pass the course the following elements must be completed by the end of the course.
- two exercise sets, (pass/fail)
- three projects, (graded)
- one project proposal for a 6 month research project, max 1500 words (pass/fail)
- Active participation in peer-review.
The excerise sets aims at building assessing the knowledge and understanding, the projects gauge, further include the the skills and abilities. Finally the project proposal integrates all elements of the intended learning outcomes.
Excercise sheets will be distributed in weeks 1 to 3.
Project descriptions along with report templates are distributed in weeks 4 to 6.
Scientific papers are distributed along with proposal template in week 7.
Projects credit distribution:
To pass a project assignment you must:
- Pass unit tests (1 point)
- Pass peer-assessment of code (1 point)
- Assess a peers code (1 point)
Get at least 3 points for your written report.
Your report will only be graded when the first 3 steps are passed.
The written report can give up to 7 points distributed as follows:
- Clear introduction (0.5 points),
- clear motivation (0.5 points),
- appropriate references to external material (1 points)
- description of methods/aids used (1 points),
- presentation of results including high quality figures/illustrations (2.5 points)
- Discussion and conclusion (1.5 points)
Each day a project is late submitted 0.5 points are withdrawn until a minimum of 6 points (passing). Submissions after the end of the study period will be considered.
All aids are allowed throughout the course and is encouraged in the projects. However, the use of aids needs to be thoroughly documented in the project reports.
Course summary:
Date | Details | Due |
---|---|---|