Course syllabus

Course Information 

This page describes the course MVE137--Probability and statistical learning using Python (LP1, HT22, 7.5 hp), which is offered by the Department of Electrical Engineering. 

Prerequisites 

A bachelor-level knowledge of probability, linear algebra, and Python. During the first week of the course, we will offer a refresher on Python and linear algebra, and provide pointers to additional review material

Aim 

The course will provide the participants with a solid foundation of probability theory and statistical learning. In particular, in this course the participants will become familiar with key probabilistic and statistical concepts in data science and will learn how to apply them to analyze data sets and draw meaningful conclusions from data. The course will cover both theoretical and practical aspects, with the objective of preparing the participants to apply the acquired knowledge to the real world. The participants will have the possibility to experiment and practice with the concepts taught in the course via Python programs and the Jupyter Notebook/Jupyter Lab platform.

Learning outcomes 

  • Explain probability concepts such as tail probability bounds, moment-generating functions and their applications, Markov chains, and central limit theorems.
  • Explain statistical models and methods that are used for prediction in science and technology, such as regression- and classification-type statistical models.
  • Select suitable statistical models to analyze existing data sets, apply sound statistical methods, and perform analyses using Python.
  • Discuss the use of common Python libraries such as numpy, matplotlib, jupyter notebook, jupyter lab, pandas, to perform data analysis.
  • Design Python-programs that apply the probability and statistical learning concepts presented in the class, to draw meaningful conclusions from data.

Content

  • Discrete random variables and expectation
  • Moments and deviations
  • Markov Chains and random walks
  • Continuous distribution and the Poisson process
  • Overview of supervised learning
  • Linear methods for regression
  • Linear methods for classification
  • Model assessment and selection

Course Staff

Course literature

The course is based on a set of lecture notes that are available on Canvas. 
Students who wish to delve deeper in the material covered in the course, or would like to access additional exercises, may consider the following references: 

  1. R. G. Gallager, Stochastic processes: theory for applications. Cambridge, U.K.: Cambridge Univ. Press, 2013.
  2. M. Mitzenmacher and E. Upfal, ‚"Proability and computing: randomization and probabilistic techniques in algorithms and data analysis‚". Cambridge, U.K.: Cambridge Univ. Press, 2017
  3. G. James, D. Witten, T. Hastie, and R. Tibshirani, An introduction to statistical learning, 7th ed. Springer, 2017
  4. T. Hastie, R. Tibshirani, and J. Friedman, ‚"The elements of statistical learning: Data mining, inference, and prediction", 2nd ed. Springer, New York, NY, U.S.A., 2017 

References 1 and 2 cover the first part of the course, which deals with probability. Reference 3 and 4 cover the second part of the course. Reference 3 is more focused on applications, whereas reference 4 provides also the underlying mathematical foundations.

Additional course material

Slides for the lecturers, exercises for the tutorial sessions, and weekly homework assignments will be made available under Modules

Exam dates 

The purpose of the written exam is to assess the learning outcomes related to the probability-theory part of the course. More details on the exam can be found below.

  • MVE137 Exam: October 28, 2022

Please remember to register for the exam, following the standard Chalmers procedures.

Python Project 

There is one mandatory python project, which is centered on the statistical-learning part of the course. The purpose of the Python project is to assess the learning outcomes related to Python programming and statistical learning. Details can be found below. The python project needs to be handed in electronically via canvas on the same date as the written exam, i.e., October 28, 2022.

Please note that it is important that you hand in your solution to the Python project even if you do plan to take the written exam at a later date. This is your only possibility to collect the corresponding points, which will affect your grade

Schedule 

Lectures 

The objective of the lectures is to highlight the most important part of the course. As a consequence, the mathematical theories underlying some of the results presented in the class will only be sketched, or sometimes omitted altogether. Students interested in delving deeper will be given additional material. Please consult the module section in Canvas for an overview of the lectures.

Python Labs

During the Python Labs, python will be used as a tool to consolidate the understanding of the theoretical topics discussed during the lectures, and to apply them to real-world scenarios. Note that Python will be used in the first part of the course mainly as a tool to illustrate the theory. In the second part, Python will instead be used as a key tool to perform data analysis.

Homework assignments

The course encompasses 6 homework assignments, which will be given during the first 6 weeks of the course and will need to be handed the following week, usually, before the exercise session on Tuesday morning, where the solution to the homework assignment will be corrected. The solution to the homework assignment needs to be handed in electronically. The first 3 homework assignments, which deal with the first part of the course, will award a maximum of 3 points per assignment. The last 3 homework assignments, which deal with the second part of the course and involve a larger portion of python programming, will given a maximum of 5 points per assignmentThe homework assignments can be carried out in groups of up to two students. 

Python project 

The course has one mandatory python project, which will cover the statistical learning part of the course. The project is carried out individually. The project will be made available over canvas at the beginning of the seventh week of the course, and your solution (in the form of a jupyter notebook) needs to be handed in electronically on the day of the written exam.

The report should be submitted in jupyter notebook format. It must not only contain the python code developed to solve the project, but also full documentation (via markdown code blocks) of the project. Specifically, it should be explained what was done, and why, and what the results were. Furthermore, the results should be commented. We expect the students to check that the results are reasonable and consistent with each other and the relevant theory. Plots must be clearly labeled. A maximum of 30 points will be awarded for the project.

Written exam

The written exam will cover the probability theory part of the course and consists of exercises similar to the ones assigned during the first 3 homework assignments and the ones covered in the Python labs. Examples of previous-year exams will be provided to the students. The written exam gives a maximum of 44 points.

We will provide a formula sheet containing the most important mathematical expressions used during the course. No other material will be allowed apart from a Chalmers approved calculator.

A minimum score of 10 points on the written exam is required to pass the course

Final grades 

The final grades will be decided based on the points collected via the homework assignments (maximum score 24) the Python project (maximum score 30) and the written exam (maximum score 44). Two points will be given to all students providing feedback on the course evaluation questionnaire. Please send an email to the TAs as soon as you submitted the evaluation questionnaire to be awarded these two points. Note that the two points will not be awarded if their addition would cause a grade to move from fail to 3.

The final grades will be decided based on the following table. Please note that a minimum score of 10 points on the written exam is required to pass the course.

  • Total score: 0-39 -> Grade: fail
  • Total score: 40-59 -> Grade: 3
  • Total score 60-79 -> Grade: 4
  • Totall score 80-100 -> Grade 5

Changes compared to the previous edition of the course

  • To free up more time for group and individual work, the number of lectures has been reduced.
  • The exam format has been modified: the python take-home exam is now a python project, assigned more than two weeks before the written exam day.
  • The python labs have been better integrated with the lectures and with the theoretical homework assignments.
  • More textbooks have been suggested, providing additional exercises, as well as a more engineering approach to the statistical-learning material

Course summary:

Date Details Due