EEN100 EEN100 Statistics and machine learning in high dimensions lp1 HT20 (7.5 hp)
The course is offered by the department of Electrical Engineering and of Mathematical Sciences
The explosion in the volume of data collected in all scientific disciplines and in industry requires students interested in statistical analyses and machine-learning and signal-processing algorithms to acquire more sophisticated probability tools than the ones taught in basic probability courses.
This course provides an introduction to the area of high-dimensional statistics, which deals with large scale problems where both the number of parameters and the sample size is large.
The course covers fundamental tools for the analysis of random vectors, random matrices and random projections, such as tail bounds and concentration inequalities. It further provides concrete applications of such tools in the context of generalization-error analyses in statistical learning theory, sparse linear model, and matrix models with rank constraints.
The course will be offered remotely via zoom. Recording of the lectures will be uploaded on canvas.
The course is based on the following three references:
R. Vershynin, High-dimensional probability: an introduction with applications in data science. Cambridge Univ. Press, 2019. Available online.
M. J. Wainwright, High-dimensional statistics: a nonasymptotic viewpoint. Cambridge, U.K.: Cambridge Univ. Press, 2019. Available online through Chalmers library.
Bandeira, Singer, and Strohmer, Mathematics of Data Science, Jun. 2020, draft 0.1. Available online.
Lecture notes prepared by the teachers will be distributed during the course
Fundamental probability tools
- Preliminaries on random variables: classical inequalities and limit theorems
- Concentration of sums of independent random variables: Hoeffding, Chernoff, Bernstein, sub-Gaussian and sub-exponential distributions
- Random vectors and random matrices in high dimensions
- Concentration without independence
- Uniform laws of large number: Rademacher complexity and VC dimension
Applications in machine learning, statistics, and signal processing
- Community detection
- Covariance matrix estimation and clustering
- Recovery of sparse signals
- Principal component analysis
- Low-rank matrix recovery
- Sample complexity in statistical learning theory
The course consists of traditional lectures and exercise sessions. Every Wednesday a homework assignment will be posted online. The students are expected to solve (in groups) each homework assignment and to present its solutions within a week. The homework assignments consist of both of theoretical exercises, and of practical exercises, which may involve programming algorithms and testing them on synthetic data sets.
Starting from week 2, the Wednesday afternoon course slot will be devoted to the correction of the exercise assignments. Students will be asked to present their solutions. A discussion will follow. The selected groups will be then asked to submit the solutions of the problems they presented.
Learning objectives and syllabus
- State basic tails and concentration bounds for sums of independent random variables
- Apply these bounds to provide guarantees on how accurately one can
- estimate a covariance matrix from data
- recover a sparse linear vector from noisy linear projections
- estimate a low-rank matrix from few of its entries
Link to the syllabus on Studieportalen.
Oral exam. The students will be asked to present one of the theoretical topics presented in the course and the solution of one of the exercises. The theoretical topic and the exercise will be chosen randomly.
The syllabus page shows a table-oriented view of course schedule and basics of course grading. You can add any other comments, notes or thoughts you have about the course structure, course policies or anything else.
To add some comments, click the 'Edit' link at the top.