Course syllabus

Course PM

MVE137 Probability and statistical learning using Python (LP1, HT21, 7.5 hp). The course is offered by the Department of Electrical Engineering

Contact details

  • Examiner: Giuseppe Durisi (durisi@chalmers.se)
  • Teachers: Giuseppe Durisi (durisi@chalmers.se), Alexandre Graell i Amat (alexandre.graell@chalmers.se)
  • Teaching assistants: Carl Kylin (carlky@chalmers.se), Charitha Madapatha Madapathage Don (charitha@chalmers.se)

Prerequisites

A bachelor-level knowledge of probability and Python. For the students with no prior knowledge of Python, pointers to tutorials will be provided.

Course purpose

The course will provide the participants with a solid foundation of probability theory and statistical learning. In particular, in this course the participants will become familiar with key probabilistic and statistical concepts in data science and will learn how to apply them to analyze data sets and draw meaningful conclusions from data. The course will cover both theoretical and practical aspects, with the objective of preparing the participants to apply the acquired knowledge to the real world. The participants will have the possibility to experiment and practice with the concepts taught in the course via Python programs and the Jupyter Notebook platform.

Schedule

Please consult the module section.

Course literature

The course will be mainly based on the following two references:

  • M. Mitzenmacher and E. Upfal, ‚"Proability and computing: randomization and probabilistic techniques in algorithms and data analysis‚". Cambridge, U.K.: Cambridge Univ. Press, 2017.
  • T. Hastie, R. Tibshirani, and J. Friedman, ‚"The elements of statistical learning: Data mining, inference, and prediction‚", 2nd ed. Springer, New York, NY, U.S.A., 2017 (available via Chalmers Library)

The first book will be used to cover the first part of the course, which deals with probability. Lecture notes, written by the teachers, roughly covering the material in this book relevant for the course will be distributed. These lecture notes can be used as a replacement of this book, although the book provides a larger number of examples. The second part of the course will be based on the second reference, which is available electronically via Chalmers library.

Throughout the course coding exercises will be done in python. For those of you that are already familiar with python, we will use python 3, so make sure you have that installed on your computer. For those of you who are less familiar, the following resources might help when installing:

  • Mac: Install homebrew and then python 3 by following this link
  • Windows: The following link may be useful.
  • Linux: The following link may be useful. 

We will use a number of libraries in python to perform calculations, visualising data, etc. These include (but may not be limited to) NumPy, MatPlotLib, Pandas, and iPyWidgets. You can either install these separately as the course progresses or get them all by installing anaconda by consulting this guide. We will also use Jupyter Notebooks to conduct the python labs. This is included in anaconda, but can also be separately installed by following this guide.  

Course design

The course consists of around 21 lectures, 1 Python introduction lecture, 7 sessions devoted to theoretical exercises, and 8 Python data-lab exercise sessions.

Learning objectives

As specified in the syllabus, the course has the following learning outcomes:

  • Explain probability concepts such as tail probability bounds, moment-generating functions and their applications, Markov chains, and central limit theorems.
  • Explain statistical models and methods that are used for prediction in science and technology, such as regression- and classification-type statistical models.
  • Select suitable statistical models to analyze existing data sets, apply sound statistical methods, and perform analyses using Python.
  • Discuss the use of common Python libraries such as numpy, matplotlib, jupyter notebook, pandas, to perform data analysis.
  • Design Python-programs that apply the probability and statistical learning concepts presented in the class, to draw meaningful conclusions from data.

Examination form

The final grade is based on scores from homework assignments, python labs, and a written exam. The total number of points for the course is 100, which are distributed as follows:

  • Homework assignments: There will be 7 homework assignments, each awarded with up to 2 points (14 in total). Each homework assignment will comprise around 4 exercises  out of which 2 will be graded. The homeworks are to be handed in in pairs (groups to be formed at the beginning of the course).
  • Python labs: There will be 9 python labs, out of which 8 will be graded (2 points each, 16 in total). Again, you will work in pairs (the same pairs as for the homework assignments).
  • Final written exam: 70 points.

Course summary:

Date Details Due