Course syllabus
Course Information
This page describes the course MVE137--Probability and statistical learning using Python (LP1, HT24, 7.5 hp), which is offered by the Department of Electrical Engineering.
Prerequisites
A bachelor-level knowledge of probability, linear algebra, and Python. During the first week of the course, we will offer a refresher on Python and linear algebra, and provide pointers to additional review material
Aim
The course will provide the participants with a solid foundation of probability theory and statistical learning. In particular, in this course the participants will become familiar with key probabilistic and statistical concepts in data science and will learn how to apply them to analyze data sets and draw meaningful conclusions from data. The course will cover both theoretical and practical aspects, with the objective of preparing the participants to apply the acquired knowledge to the real world. The participants will have the possibility to experiment and practice with the concepts taught in the course via Python programs and the Jupyter Notebook/Jupyter Lab platform.
Learning outcomes
- Explain probability concepts such as tail probability bounds, moment-generating functions and their applications, Markov chains, and central limit theorems.
- Explain statistical models and methods that are used for prediction in science and technology, such as regression- and classification-type statistical models.
- Select suitable statistical models to analyze existing data sets, apply sound statistical methods, and perform analyses using Python.
- Discuss the use of common Python libraries such as numpy, matplotlib, jupyter notebook, jupyter lab, pandas, to perform data analysis.
- Design Python-programs that apply the probability and statistical learning concepts presented in the class, to draw meaningful conclusions from data.
Content
- Discrete random variables and expectation
- Moments and deviations
- Markov Chains and random walks
- Continuous distribution and the Poisson process
- Overview of supervised learning
- Linear methods for regression
- Linear methods for classification
- Model assessment and selection
Course Staff
- Examiner: Giuseppe Durisi (office: 6312)
- Teachers: Giuseppe Durisi (office: 6312), Alexandre Graell i Amat (office: 6409)
- Teaching assistants: Carl Kylin (office: 6436), Charitha Madapatha Madapathage Don (office: 6333)
Course literature
The course is based on a set of lecture notes that are available on Canvas.
Students who wish to delve deeper in the material covered in the course, or would like to access additional exercises, may consider the following references:
- R. G. Gallager, Stochastic processes: theory for applications. Cambridge, U.K.: Cambridge Univ. Press, 2013.
- M. Mitzenmacher and E. Upfal, ‚"Proability and computing: randomization and probabilistic techniques in algorithms and data analysis‚". Cambridge, U.K.: Cambridge Univ. Press, 2017
- G. James, D. Witten, T. Hastie, and R. Tibshirani, An introduction to statistical learning, 7th ed. Springer, 2017
- T. Hastie, R. Tibshirani, and J. Friedman, ‚"The elements of statistical learning: Data mining, inference, and prediction", 2nd ed. Springer, New York, NY, U.S.A., 2017
-
R. D. Yates & D. J. Goodman, “Probability and Stochastic Processes”, 3rd edition, Wiley, Singapore, 2015.
References 1 and 2 cover the first part of the course, which deals with probability. Reference 3 and 4 cover the second part of the course. Reference 3 is more focused on applications, whereas reference 4 provides also the underlying mathematical foundations. If you feel rusty about probability theory, please use Reference 5 to perform a review.
Additional course material
Slides for the lecturers, exercises for the tutorial sessions, and weekly homework assignments will be made available under Modules
Exam dates
The purpose of the written exam is to assess the learning outcomes related to the probability-theory part of the course. More details on the exam can be found below.
- MVE137 Exam: November 1, 2024
Please remember to register for the exam, following the standard Chalmers procedures.
Python Project
There is one mandatory python project, which is centered on the statistical-learning part of the course. The purpose of the Python project is to assess the learning outcomes related to Python programming and statistical learning. Details can be found below. The python project needs to be handed in electronically via canvas on the same date as the written exam, i.e., November 1, 2024.
Please note that it is important that you hand in your solution to the Python project even if you do plan to take the written exam at a later date. This is your only possibility to collect the corresponding points, which will affect your grade
Schedule
Lectures
The objective of the lectures is to highlight the most important part of the course. As a consequence, the mathematical theories underlying some of the results presented in the class will only be sketched, or sometimes omitted altogether. Students interested in delving deeper will be given additional material. Please consult the module section in Canvas for an overview of the lectures.
Python Labs
During the Python Labs, python will be used as a tool to consolidate the understanding of the theoretical topics discussed during the lectures, and to apply them to real-world scenarios. Note that Python will be used in the first part of the course mainly as a tool to illustrate the theory. In the second part, Python will instead be used as a key tool to perform data analysis.
Homework assignments
The course encompasses 6 homework assignments, which will be given during the first 6 weeks of the course and will need to be handed the following week, usually, before the exercise session on Tuesday morning, where the solution to the homework assignment will be corrected. The solution to the homework assignment needs to be handed in electronically. The first 3 homework assignments, which deal with the first part of the course, will award a maximum of 3 points per assignment. The last 3 homework assignments, which deal with the second part of the course and involve a larger portion of python programming, will given a maximum of 5 points per assignment. The homework assignments can be carried out in groups of up to two students.
Office hours: For the first part of the course Carl will be available in his office (6436, in EDIT building, floor 6Ö, take the north staircase) to answer questions regarding the homework on Mondays September 9th, 16th, and 23rd from 15:30-17:00. For questions outside office hours, feel free to email, message on Canvas, or ask during the break in class.
Python project
The course has one mandatory python project, which will cover the statistical learning part of the course. The project is carried out individually. The project will be made available over canvas at the beginning of the seventh week of the course, and your solution (in the form of a jupyter notebook) needs to be handed in electronically on the day of the written exam.
The report should be submitted in jupyter notebook format. It must not only contain the python code developed to solve the project, but also full documentation (via markdown code blocks) of the project. Specifically, it should be explained what was done, and why, and what the results were. Furthermore, the results should be commented. We expect the students to check that the results are reasonable and consistent with each other and the relevant theory. Plots must be clearly labeled. A maximum of 30 points will be awarded for the project.
Written exam
The written exam will cover the probability theory part of the course and consists of exercises similar to the ones assigned during the first 3 homework assignments and the ones covered in the Python labs. Examples of previous-year exams will be provided to the students. The written exam gives a maximum of 44 points.
We will provide a formula sheet containing the most important mathematical expressions used during the course. No other material will be allowed apart from a Chalmers approved calculator.
A minimum score of 10 points on the written exam is required to pass the course
Final grades
The final grades will be decided based on the points collected via the homework assignments (maximum score 24) the Python project (maximum score 30) and the written exam (maximum score 44). Two points will be given to all students providing feedback on the course evaluation questionnaire. Please upload a screen shot showing you completed the course evaluation in the relevant assignment. Note that the two points will not be awarded if their addition would cause a grade to move from fail to 3. Also note that, for logistical reasons, these points are only available to students taking the course for the first time.
The final grades will be decided based on the following table. Please note that a minimum score of 10 points on the written exam is required to pass the course.
- Total score: 0-39 -> Grade: fail
- Total score: 40-59 -> Grade: 3
- Total score 60-79 -> Grade: 4
- Totall score 80-100 -> Grade 5
Changes compared to the previous editions of the course
- TA office hours have been introduced
- Additional exercises are available in the lecture notes
Course summary:
Date | Details | Due |
---|---|---|