Course syllabus

DAT530 / DIT579 Structured Machine Learning lp1 HT23 (7.5 hp)

Course is offered by the department of Computer Science and Engineering

Contact details

Lecturer and examiner: Simon Olsson (simonols@chalmers.se)

 

Course purpose

The purpose of this course is to give a broad introduction of structured machine learning. Structured machine learning involves using known structure in data to build learning models. Important examples are spatial structure, for example convolutions and attention on images, sequences, and graphs, or time structure such as recurrence. The course focuses on building a strong understanding of the underlying concepts and applying them in a practical setting through project based assignments. The principles taught in the course will be generally applicable, yet the focus will be on applications in the natural sciences (Physics, Chemistry and Biology) where symmetries and structure are often exactly known.

Schedule

Mondays 9.00-11:45 (Lecture)
Wednesdays 10.00-11:45 (Lecture)
Wednesdays 13.15-17:00 (Autonomous lab)
Session Week Date Time Lecture Room Zoom
1 35 28-Aug 9:00-11:45 Introduction, course overview SB-L227. SB1 https://chalmers.zoom.us/j/68838671720? (password: DAT530)
2 35 30-Aug 10:00-11:45 Data generating processes, high-dimensional data SB-L227. SB1 https://chalmers.zoom.us/j/67527797970? (password: DAT530)
3 36 04-Sep 9:00-11:45 Geometric priors SB-L227. SB1 https://chalmers.zoom.us/j/62483528004 (password: DAT530)
4 36 06-Sep 10:00-11:45 Groups and convolutions SB-L227. SB1 https://chalmers.zoom.us/j/63647018720 (password: DAT530)
5 37 11-Sep 9:00-11:45 Convolutions and Group representations SB-L227. SB1  https://chalmers.zoom.us/j/62085212928 (password: DAT530)
6 37 13-Sep 10:00-11:45 Geodesics and manifolds SB-L227. SB1 https://chalmers.zoom.us/j/64274484774 (password: DAT530)
7 38 18-Sep 9:00-11:45 Graphs and sets SB-L227. SB1 https://chalmers.zoom.us/j/67636218534 (password: DAT530)
8 38 20-Sep 10:00-11:45 Unnormalized distributions and sampling SB-L227. SB1 https://chalmers.zoom.us/j/61238976732 (password: DAT530)
9 39 25-Sep 9:00-11:45 Grids SB-L227. SB1 https://chalmers.zoom.us/j/67586085561 (password: DAT530)
10 39 27-Sep 10:00-11:45 Molecules SB-L227. SB1 https://chalmers.zoom.us/j/65152846183 (password: DAT530)
11 40 02-Oct 9:00-11:45 Large molecules, proteins and representations SB-L227. SB1

https://chalmers.zoom.us/j/64338544230?pwd=ZWRhQmtUQy81OFdOK3pscFVhWmxnZz09

12 40 04-Oct 10:00-11:45 Self-study - projects SB-L227. SB1
13 41 09-Oct 9:00-11:45 Gauges SB-L227. SB1 https://chalmers.zoom.us/j/67826556139?pwd=MCtKOHhJLysxNGdmbHkzR1c1Y3p5dz09
14 41 11-Oct 10:00-11:45 Dimensions and units SB-L227. SB1 https://chalmers.zoom.us/j/65049388818?pwd=ZVlrdVNHM2l6N2JZN2hpSklUYVpWUT09
15 42 16-Oct 9:00-11:45 Projects SB-L227. SB1
16 42 18-Oct 10:00-11:45 Repetition SB-L227. SB1 https://chalmers.zoom.us/j/66231499819?pwd=THNUbnhJc0N3TkNiMXZtREdkTVpWQT09

 

TimeEdit

 

Assignment Schedule

Release Deadline Final resubmission (to pass course) Grading
Hand-in 1 04-Sep 13-Sep 23-Oct Pass/fail
Hand-in 2 11-Sep 20-Sep 23-Oct Pass/fail
Project 1 18-Sep 28-Sep 24-Oct Graded
Project 2 25-Sep 07-Oct 24-Oct Graded
Project 3 02-Oct 11-Oct 25-Oct Graded
Assay 04-Oct 20-Oct 25-Oct Pass/fail

Course literature (to be updated)

Large parts of the course material is loose adaptations from Geometric Deep Learning summer schools.

Primary reference:

Bronstein, Bruna, Cohen, and Veličković:  Geometric deep learning. (Free proto-book available: https://arxiv.org/abs/2104.13478)

Selected primary literature and lecture notes TBA

Extra literature:

Serre "Linear Representations of Finite Groups" (1977) https://link.springer.com/book/10.1007/978-1-4684-9458-7

Background references for repetition:

Deisenroth, Faisal, and Ong "Mathematics for Machine Learning" (2021) https://mml-book.github.io/book/mml-book.pdf

Shapira "Linear Algebra and Group Theory for Physicists and Engineers." (2019)  https://link.springer.com/book/10.1007/978-3-030-17856-7

Petersen and Pedersen "The Matrix Cookbook" https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf

Git crash course: https://julienpascal.github.io/slides/intro_git/#/

Practical extra information:

37 reasons why you neural network is not working (Blog post)

 

Course design

The course is based on four components

  • Self study: reading and video lectures
  • Interactive hybrid lectures
  • Individual project work
  • Peer-assessment

and follows a blended classroom structure.

Self study:

Before each week reading and video material is provided for preparation. Preparation is key for the success of the interactive hybrid lectures. In these lectures, we will be moving through concepts and problems, and solve them in teams. There will not be repetition of materials given in advance, we assume attendees have read and followed provided material in advance. The first lecture is an exception.

Lectures:

The in-class lectures are interactive and focus on problem solving in teams, interspersed with micro lectures, and discussions (monday and wednesday mornings). The lecture sessions will be held in hybrid from (in person class and zoom remote attendance) however, in person attendance is highly advised.

Project work and assessment:

The course assessment revolves around three individual projects each with equal weight towards the final grade. The projects focus on different aspects of structural learning but are problem-driven. Further first three weeks will have two take home assignments which are pass/fail, both need to be passed to pass the course. Finally, a written project proposal should be submitted and approved (pass/fail) in order to pass the course. More details on the projects and the written proposal will be made available on the course pages in due time.

Peer assessment:

The projects consist of written code and a written report. Before submission, the code should pass all basic unit tests.

Assessment happens in three stages:

1. Peer assessment of code: each course attendee gets assigned to review another course attendees code to make suggestions for improvements.

2. Rebuttal: each course attendee implements changes and rebuts each comment received. The recipient rates the feedback as useful or not useful.

3. Submitted report: Following a successful code review a written report is submitted for grading.

A successful code review means that:

  • your code passes unit tests
  • you have addressed comments on your code
  • you have had your comments on a peers code approved (useful)

We will use Chalmers Gitlab to conduct code submissions and review -- course responsible will oversee the reviewing process.

 

Deadlines for code submissions are 3 days before deadline for the report. For every day a report is late one point is subtracted until only a passing grade (3) can be achieved.

 

Course Content:

  1. Module: What do we know and how can we use it? (3 weeks)
    In this module we will broadly cover important background knowledge when working with structured data for example:
    - Limits of observations and data generation. Concepts:
    Data generating processes: spatial and temporal structure. Learning in high-dimensional spaces. Dimensions and scales. Perturbations and interventions.
    - Representations and inductive biases. Concepts:
    Symmetry and conservation (Noether). Geometric priors. Basic group and representation theory. Invariance, covariance, equivariance. Convolutions from first principles. Neural networks as representation learners. Parameter-sharing/tying. Geometric deep learning.
    Assessment:
    Two exercise sets.
  2. Module: Physics – Project: Sampling the Boltzmann distribution (1 week)
    Particles in 2 and 3D, potential energies, and simulation. Generative models.
  3. Module: Chemistry – Project: Predicting molecular energies (1 week)
    Molecules, molecular representations (strings and graphs), molecular properties. Supervised learning, regression.
  4. Module: Biology – Project: chemical properties (1 week)
    Proteins, representations of proteins, molecular evolution. Unsupervised learning.
  5. Module: Bridging the disciplines – Assignment: propose MSc thesis project.
    In this module we cover a recent examples of molecular machine learning.

 

Learning objectives and syllabus

Intended learning outcomes:

After the course the student is expected to be able to

Knowledge and understanding

  • Summarize data generation processes as a schematic figure
  • Motivate the use of structured machine learning approaches
  • Summarize basic concepts of group theory and group representation theory
  • List examples of structured machine learning architectures
  • Explain the basic principles of structured learning architectures

Skills and abilities

  • Conceptualize a machine learning system which uses structure from a data generating process
  • Identify symmetries in data.
  • Implement machine learning models to approximate structure endowed by a given data generation process
  • Use basic concepts from group theory and group representation theory to rationalize machine learning architectures.
  • Design a small-scale machine learning research project making use of symmetry and structure in a dataset.

Judgment and approach

  • Judge recent scientific reports on machine learning for structured data.
  • Appraise small-scale structured machine learning project.

 

This course is running for the first time.

 

Examination form

To pass the course the following elements must be completed by the end of the course.

  • two exercise sets,  (pass/fail)
  • three projects, (graded)
  • one project proposal for a 6 month research project, max 1500 words (pass/fail)
  • Active participation in peer-review.

The excerise sets aims at building assessing the knowledge and understanding, the projects gauge, further include the the skills and abilities. Finally the project proposal integrates all elements of the intended learning outcomes.

Excercise sheets will be distributed in weeks 1 to 3.

Project descriptions along with report templates are distributed in weeks 4 to 6.

Scientific papers are distributed along with proposal template in week 7.

Projects credit distribution:

To pass a project assignment you must:
- Pass unit tests (1 point)
- Pass peer-assessment of code (1 point)
- Assess a peers code (1 point)
Get at least 3 points for your written report.
Your report will only be graded when the first 3 steps are passed.

The written report can give up to 7 points distributed as follows:

  • Clear introduction (0.5 points), 
  • clear motivation (0.5 points),
  • appropriate references to external material (1 points)
  • description of methods/aids used (1 points),
  • presentation of results including high quality figures/illustrations (2.5 points)
  • Discussion and conclusion (1.5 points)

Each day a project is late submitted 0.5 points are withdrawn until a minimum of 6 points (passing). Submissions after the end of the study period will be considered.

All aids are allowed throughout the course and is encouraged in the projects. However, the use of aids needs to be thoroughly documented in the project reports.

Course summary:

Date Details Due