MVE441 / MSA220 Statistical learning for big data Spring 21

Planned structure

Please check out the plan for the course that details all important dates.

Student representatives

The following students have been appointed (master program in parentheses):

Leo Benson (MPENM)
Hanna Skytt (MPENM)
Arunachalam Narasimhan (MPCAS)
Oskar Thune (MPCAS)
Xinrong Zhao (MPDSC)

In short, the student representatives and I will meet up once during the course and once after the course is over to discuss how everything works/worked. If you have opinions concerning the course you are of course always welcome to contact me directly, but if you want you can also contact one of the student representatives and they will collect and bring this information to me

You can read more about the what a student representative is at the following link. The representatives can be reached through the messaging function in Canvas. Simply go to your inbox, new message, choose the course, and type the name of the student you want to reach.

Course PM

This page contains a description of the program of the course. Other information, such as learning outcomes, teachers, recommended course literature, project work and examination, are in a separate course PM.

Important!

Please make sure to take the time and read the course PM as it contains valuable information on the course setup.

Program

The schedule of the course is in TimeEdit.

Contents (preliminary, subject to detail changes)

Model-based Classification
- Logistic, probit and softmax regression
- Nearest centroids/Naive Bayes
- Linear, quadratic and diagonal discriminant analysis
Model Assessment for Predictive Learning / Model Selection through Cross-Validation
Tree-based methods
- Classification and Regression Trees (CART)
- Bagging and the bootstrap
- Random Forests & Variable Importance
Data representations:
- Singular Value Decomposition
- Principal Component Analysis
- Regularized Discriminant Analaysis
- Factor analysis
- Non-negative Matrix Factorization
- Intro to kernels and the kernel trick
- kernel-PCA
- Other applications of the kernel trick: Kernel ridge regression
- Multi-dimensional scaling, Isomap, tSNE
Clustering
- Combinatorial Clustering
- k-means
- k-medoids/partition around medoids
- Selection of Cluster Count
- Hierarchical Clustering
- Gaussian Mixture Models
- Expectation Maximization and Clustering
- Mixture Discriminant Analysis
- Density-based clustering / DBSCAN
Penalized regression/classification methods
- Regularization and Variable selection (Ridge Regression, Lasso)
- Nearest Shrunken Centroids
- Elastic Net
- Group Lasso
- Oracle estimators
- SCAD
- Graphical Lasso
- sparse logistic regression
High-dimensional clustering:
- Subspace clustering/co-clustering
- Spectral clustering
Large sample methods
- Randomized Projection
- Randomized SVD
- Divide and Conquer
- Random Forests for big-n
- m-out-of-n bootstrap
- bag of little bootstraps
- leveraging

Course requirements

The official course specific prerequisites, as stated in the course plan, are:

The prerequisites for the course are a basic course in statistical inference and MVE190/MSG500 Linear Statistical Models. Students can also contact the course instructor for permission to take the course.

This means you should be familiar with the following:

Basic vector calculus and linear algebra (Matrices, vectors, gradients, ...)
Basic statistics (probability density and mass functions, cumulative distribution functions, expected value as an integral, (co-)variance, correlation, …)
Common distributions (Normal, Student-t, Gamma, Chi-Square, ...)
In terms of multivariate distributions, at least the multivariate Normal distribution
Parameter estimation in the framework of maximum likelihood
Knowledge about least squares methods and their statistical properties
Linear regression and how to interpret its results
Programming skills (Knowledge of basic control flow and ideally some basic knowledge of statistical programming, e.g. how to generate random numbers, how to perform simple simulations, ...; R or Python are recommended for this course)

Course summary:

Course Summary
Date	Details	Due

July 2026

Calendar
Sunday	Monday	Tuesday	Wednesday	Thursday	Friday	Saturday
29 June 2026 Previous month Next month Today Click to view event details	30 June 2026 Previous month Next month Today Click to view event details	1 July 2026 Previous month Next month Today Click to view event details	2 July 2026 Previous month Next month Today Click to view event details	3 July 2026 Previous month Next month Today Click to view event details	4 July 2026 Previous month Next month Today Click to view event details	5 July 2026 Previous month Next month Today Click to view event details
6 July 2026 Previous month Next month Today Click to view event details	7 July 2026 Previous month Next month Today Click to view event details	8 July 2026 Previous month Next month Today Click to view event details	9 July 2026 Previous month Next month Today Click to view event details	10 July 2026 Previous month Next month Today Click to view event details	11 July 2026 Previous month Next month Today Click to view event details	12 July 2026 Previous month Next month Today Click to view event details
13 July 2026 Previous month Next month Today Click to view event details	14 July 2026 Previous month Next month Today Click to view event details	15 July 2026 Previous month Next month Today Click to view event details	16 July 2026 Previous month Next month Today Click to view event details	17 July 2026 Previous month Next month Today Click to view event details	18 July 2026 Previous month Next month Today Click to view event details	19 July 2026 Previous month Next month Today Click to view event details
20 July 2026 Previous month Next month Today Click to view event details	21 July 2026 Previous month Next month Today Click to view event details	22 July 2026 Previous month Next month Today Click to view event details	23 July 2026 Previous month Next month Today Click to view event details	24 July 2026 Previous month Next month Today Click to view event details	25 July 2026 Previous month Next month Today Click to view event details	26 July 2026 Previous month Next month Today Click to view event details
27 July 2026 Previous month Next month Today Click to view event details	28 July 2026 Previous month Next month Today Click to view event details	29 July 2026 Previous month Next month Today Click to view event details	30 July 2026 Previous month Next month Today Click to view event details	31 July 2026 Previous month Next month Today Click to view event details	1 August 2026 Previous month Next month Today Click to view event details	2 August 2026 Previous month Next month Today Click to view event details
3 August 2026 Previous month Next month Today Click to view event details	4 August 2026 Previous month Next month Today Click to view event details	5 August 2026 Previous month Next month Today Click to view event details	6 August 2026 Previous month Next month Today Click to view event details	7 August 2026 Previous month Next month Today Click to view event details	8 August 2026 Previous month Next month Today Click to view event details	9 August 2026 Previous month Next month Today Click to view event details