Course syllabus
Course PM for EEN100: Statistics and machine learning in high dimensions
LP1 HT24 (7.5 hp)
The course is offered by the department of Electrical Engineering and of Mathematical Sciences
Contact details
Teachers
Teaching assistant
Course purpose
The explosion in the volume of data collected in all scientific disciplines and in industry requires students interested in statistical analyses and machine-learning and signal-processing algorithms to acquire more sophisticated probability tools than the ones taught in basic probability courses.
This course provides an introduction to the area of high-dimensional statistics, which deals with large scale problems where both the number of parameters and the sample size is large.
The course covers fundamental tools for the analysis of random vectors, random matrices and random projections, such as tail bounds and concentration inequalities. It further provides concrete applications of such tools in the context of generalization-error analyses in statistical learning theory, sparse linear model, community detetctions in graphs, and matrix models with rank constraints.
Schedule
Course literature
The course is based on lecture notes developed by the teachers and available here.
For students interested in further exploring the topics covered in the course, we recommend the following 5 references.
-
Vershynin, High-dimensional probability: an introduction with applications in data science. Cambridge Univ. Press, 2019. Available online.
-
Wainwright, High-dimensional statistics: a nonasymptotic viewpoint. Cambridge, U.K.: Cambridge Univ. Press, 2019. Available online through Chalmers library.
-
Bandeira, Singer, and Strohmer, Mathematics of Data Science, Jun. 2020, draft 0.1. Available online.
- Foucart and Rauhut, A mathematical introduction to compressive sensing, 2013. Available online through Chalmers library
-
Shalev-Shwartz and Ben-David, Understanding machine learning: from theory to algorithms. Cambridge, U.K.: Cambridge Univ. Press, 2014. Available online through Chalmers library
Course design
Content
Fundamental probability tools
- Preliminaries on random variables: classical tail bounds and limit theorems
- Chernoff bounds for sub-Gaussian and sub-exponential distributions
- Concentration bounds for sums of independent random variables
- Random vectors and random matrices in high dimensions
Applications in machine learning, statistics, and signal processing
- Community detection
- Covariance matrix estimation and clustering
- Recovery of sparse signals
- Low-rank matrix recovery
- Sample complexity in statistical learning theory
Organization
The course consists of 16 lectures, 4 homework assignments and 3 applied projects.
Most Wednesdays, a homework assignment/project will be posted online. The students are expected to solve (in groups) each homework assignment/project and to present its solutions within a week (2 weeks for projects). The homework assignments consist mainly of theoretical exercises. The projects are applied and involve programming algorithms in your favorite programming language (e.g., python, R, or matlab) and testing them on synthetic data sets.
Starting from week 2, the Wednesday afternoon course slot will be mostly devoted to the correction of the homework assignments and to the presentation of the projects. Students will be asked to present their solutions. A discussion will follow. Handing in a solution to the homework assignment/project before the discussion session is compulsory. One solution per group suffices. The groups can consist of at most 3 students.
Learning objectives and syllabus
Learning objectives:
- State basic tails and concentration bounds for sums of independent random variables
- Apply these bounds to provide guarantees on how accurately one can
- estimate a covariance matrix from data
- recover a sparse linear vector from noisy linear projections
- estimate a low-rank matrix from few of its entries
Link to the syllabus on Studieportalen.
Examination form
Oral exam. The students will be asked to present the solution of one of the homework assignment (with follow-up theoretical questions) and of one of the three applied projects. Both the homework assignment and the project will be selected randomly. The student will be given 30 minutes to prepare before the start of the oral exam
Course summary:
Date | Details | Due |
---|---|---|