MVE190 / MSG500, MSG501 Statistical learning with regression models Autumn 25

This page contains the program of the course: lectures, exercise sessions and computer labs. Other information, such as learning outcomes, teachers, literature and examination, are in a separate course PM. The course is in-person for what concerns the lectures (no Zoom, no video-recorded lectures). For compulsory mini-analyses presentations it will be possible to connect via zoom or be in the room (see below).

2025_Topics and hence potential exam questions.pdf

Minutes of the mid-course meeting for Statistical Learning 2025.pdf

 

Program

The schedule of the course is in TimeEdit

Lectures will be "in-person" in University rooms and will not be recorded.

We will have a single in-person non-compulsory computer lab at 15.15-17.00 on 6 November. Bring your own laptop. This lab is NOT structured with presentations of concepts. The lab is useful to get you started with the R/Rstudio software if you have no previous experience. If you already have some experience with R you probably won't need it.

Notice, there are some activities for which attendance is mandatory either in room or via zoom. These are the Fridays "mini-analyses". See the course PM for details and what to do in case you are unable to attend.

 

Lectures (below is a  plan based on last year. Deviations 

week Topics slides/notes code and data files
MiniAnalysis
45

Tuesday: General intro to the course and mention of several topics: bias in linear regression, least squares; parameters interpretation in simple linear regression;

 

 

 

lab0.pdf

slides_1.pdf

Also, check Check "lecture 1" in Jörnsten's notes found in Course PM

 

Lecture1.R

no mini analysis this week

45

Thursday: least squares estimates and relation to correlation; interpretation of coefficients with transformed variables. the 5 basic assumptions and residual plots.

 

slides_2.pdf

formula-sheet.pdf

Check "lecture 2" in Jörnsten's notes (linked in the previous lecture).

Demo1-2025.R

sleeptab.dat

Lecture2.R

no mini analysis this week

45

Thursday 6 November Computer lab: BRING YOUR LAPTOP with R/RStudio in MVF24 and MVF25: the lab is not structured with presentation of concepts. We are there to help you go through some basic exercise for those that are new to R.

 

lab1.pdf

 

 no mini analysis this week

46

Tuesday: Proof of the unbiasedness of OLS parameter estimators; variance of the estimators; some residuals-based diagnostics; box-cox transformations; 

 

slides_3.pdf

Box-Cox transf. see section 12.4 in Rawling's book.

Check "lecture 3" in Jörnsten's notes.

We did not manage to introduce the leverage (-->next lecture)

 

boxcox-type_transforms.R

also see again Lecture2.R

 

46

Thursday: leverage values; deletion-based diagnostics; MSE and proof of unbiasedness of MSE; unbiasedness of yhat;  expectation and variance of residuals; variability decomposition (SSEr, SStor, SSRegr); Rsquared (coeff of determination); t-test construction

Optional exercises for self-study from Rawling's book Links to an external site.: exercise 1.1, exercise 1.4, relevant bits of exercise 1.9, ex. 1.10, ex. 1.16,

 

 

slides_4.pdf

StatisticalTables.pdf

Check "lecture 4" in Jörnsten's notes.

The R-squared is also known as coefficient of determination (page 220 and onward in the book by Rawling et al)

 Demo3.R

Lecture3.R

Friday 14 Nov *: present  work for mini-analysis 1.

MiniAnalysis1-2025.pdf

bikesharing.csv

TV.dat

 

 

47

Tuesday: completing t-tests; pvalues;   confidence intervals for parameters and for E(Y0). Prediction intervals for Ypred0.

we finish slides_4.pdf, then we cover

slides_5.pdf

(see also the relevant bits in lectures 4-5 of Jörsten's notes)

Demo4.R

Lecture4.R

no mini analysis this week

47

Thursday: The Simpson's paradox and notation for multiple lin. regression.  Properties of the estimators in multiple regression and sampling distributions. t-test and categorical covariates (not everything); Topics  also found in chapter 9 ("Class variables") in Rawlings et al Links to an external site. up to sect. 9.3.

 

Optional exercises for self-study from Rawling's book Links to an external site.: exercise 3.5(part a and d), ex. 3.10, ex. 3.11, ex. 3.12 (we never do regression without intercept...but if you are interested...), ex 3.13

 

 

slides_6.pdf

multipleregression.R

advertising.R

Advertising.csv

 

 no mini analysis this week

48

Tue: models with categorical and numerical covariates. 

slides_7.pdf

(we stopped at slide 35)

 

auction.R

auction.dat

 

PROJECT (final version) project25.pdf

CarPrice.csv

controllers.dat

48

Thu:  problems with p-values and large datasets; Multicollinearity;  Variance Inflation Factor; Partial F test;

(we restart from slide 35 from slides_7)

slides_8.pdf

for Partial F: section 2 in Lect 6 of notes;

multicollinearity: pages 100--103 in ISLR and pages 372-373 in Rawlings' et al.
[ISLR= "Intro to Statistical Learning and Regression"]

multicollinear.R

partial_Ftest.R

 

Friday 28 Nov at 13.15*: present minianalysis 2

MiniAnalysis2-2025.pdf

DEADLINE 9.00am

zoom: https://chalmers.zoom.us/j/61066694734

passcode: 653917

49

Tue: greedy variables selection: backward search. Bias/variance tradoff and the pMSE. Training and testing, PMSE with regsubsets.

 

 slides_9.pdf

[we stopped at slide 34]

[ISLR: sections 2.2.1-2.2.2, 5.1.1, 6.1.1-6.1.2]

[Rawling's et al: sec 7.3, 7.4]

Lecture6.R

Lecture7.R

SA.dat

cars.dat

 

no minianalysis this week

49

 

Thu: more on pMSE with categorical covariates; interactions

[we restart from page 35 of slides_9.pdf]

slides_10.pdf

regsubsets-categorical-covar.pdf

regsubsets-olsrr-categorical.R

 

no minianalysis this week

 

50

Tue: ; adj-Rsquared; Kullback-Leibler; AIC, BIC; K-fold CV;

 

 

 

50

Thu: LOOCV; hat-matrix; residuals; standardised residuals,  studentised residuals; Cook's distance, DFBETAs; intro GLMs;

 

 

 

 

 

 

Friday 12 Dec at 13.15*: present minianalysis 3

MiniAnalysis3-2025.pdf

 

51

Tue: the exponential family; Newton-Raphson; Poisson regression; confidence intervals for GLMs;  asymptotic properties of the MLE; CI for predictions; Wald test; deviance; likelihood ratio test; 

 

 

 

 

 

no mini analysis this week
51

Thu: Poisson + offset term. Negative binomial regression also with offset. Quick tour through GLM diagnostics (diagnostics can be skipped for the exam)

 

 

 

no mini analysis this week

 

Back to the top

 

Computer lab and software

You are encouraged to attend the lab on Thursday 6 November to experiment with some basic analyses. No further computer lab will be given.

Software: We will use the statistical package R to analyze data, powered via the Rstudio interface. You will need to install both on your computer, see the page Installing R and Rstudio.
No previous knowledge of R is required.

Some useful resources:

If you are familiar to MATLAB or Python, the following may be useful:

 

Back to the top

Course summary:

Course Summary
Date Details Due