MVE440/MSA220 Statistical learning for big data

MVE440/MSA220 Statistical learning for big data

Course PM

This page contains a description of the program of the course. Other information, such as learning outcomes, teachers, recommended course literature, project work and examination, are in a separate course PM.


Please make sure to take the time and read the course PM as it contains valuable information on the course setup.


The schedule of the course is in TimeEdit.

Contents (preliminary, subject to detail changes)

  • Model-based Classification
    • Logistic, probit and softmax regression
    • Nearest centroids/Naive Bayes
    • Linear, quadratic and diagonal discriminant analysis
  • Model Assessment for Predictive Learning / Model Selection through Cross-Validation
  • Tree-based methods
    • Classification and Regression Trees (CART)
    • Bagging and the bootstrap
    • Random Forests & Variable Importance
  • Data representations:
    • Singular Value Decomposition
    • Principal Component Analysis
    • Regularized Discriminant Analaysis
    • Factor analysis
    • Non-negative Matrix Factorization
    • Intro to kernels and the kernel trick
    • kernel-PCA
    • Other applications of the kernel trick: Kernel ridge regression
    • Multi-dimensional scaling, Isomap, tSNE
  • Clustering
    • Combinatorial Clustering
    • k-means
    • k-medoids/partition around medoids
    • Selection of Cluster Count
    • Hierarchical Clustering
    • Gaussian Mixture Models
    • Expectation Maximization and Clustering
    • Mixture Discriminant Analysis
    • Density-based clustering / DBSCAN
  • Penalized regression/classification methods
    • Regularization and Variable selection (Ridge Regression, Lasso)
    • Nearest Shrunken Centroids
    • Computational aspects of the lasso
    • Elastic Net
    • Group Lasso
    • Oracle estimators
    • adaptive lasso
    • SCAD
    • sparse logistic regression
  • High-dimensional clustering:
    • Subspace clustering/co-clustering
    • Spectral clustering
  • Large sample methods
    • Randomized Projection
    • Randomized SVD
    • Divide and Conquer
    • Random Forests for big-n
    • m-out-of-n bootstrap
    • bag of little bootstraps
    • leveraging

Back to top

Course requirements

The official course specific prerequisites, as stated in the course plan, are:

The prerequisites for the course are a basic course in statistical inference and MVE190/MSG500 Linear Statistical Models. Students can also contact the course instructor for permission to take the course.

This means you should be familiar with the following:

  • Basic vector calculus and linear algebra (Matrices, vectors, gradients, ...)
  • Basic distributions (Normal, Student-t, Gamma, Chi-Square, ...)
  • In terms of multivariate distributions, at least the multivariate Normal distribution
  • Parameter estimation in the framework of maximum likelihood
  • Knowledge about least squares methods and their statistical implications
  • Linear regression and how to interpret its results

Back to top

Course summary:

Date Details Due