MVE190 / MSG500 Linear statistical models

This page contains the program of the course: lectures, exercise sessions and computer labs. Other information, such as learning outcomes, teachers, literature and examination, are in a separate course PM. This course edition will be in-person for what concerns the lectures (no Zoom, no video-recorded lectures). For compulsory mini-analyses presentations it will be possible to connect via zoom or be in the room (see below).

List of topics and potential exam questions: 2023_Topics and hence potential exam questions.pdf

 

Program

The schedule of the course is in TimeEdit

Lectures will be "in-person" in University rooms and will not be recorded.

We will have a single non-compulsory computer lab at 15.15-17.00 on 2 November which will NOT be via zoom. This lab is not structured with presentations of concepts. The lab is useful to get you started with the R/Rstudio software if you have no previous experience. If you already have some experience with R you probably won't need it.

Notice, there are some activities for which attendance is mandatory either in room or via zoom. These are the Fridays "mini-analyses". See the course PM for details and what to do in case you are unable to attend.

 

Lectures (below is a  plan based on last year. Deviations may occur).

(Here are the videos of the 2020 lectures. These are to be considered as useful material in case you miss a lecture, but we discourage you to rely on these videos as a substitute to attendance. Three years have passed and a few things have changed and we don't want you to get confused!)

 

week Topics slides/notes code and data files
MiniAnalysis
44

Tuesday: General intro to the course and mention of several topics: bias in linear regression, least squares; parameters interpretation in simple linear regression;

 

 

 

lab0.pdf (you must go through this file before the Thursday lab)

slides_1.pdf

 

Also, check Check "lecture 1" in Jörnsten's notes found in Course PM

Lecture1.R

 

44

Thursday: least squares estimates and relation to correlation; interpretation of coefficients with transformed variables. the 5 basic assumptions and residual plots.

 

formula-sheet.pdf

slides_2.pdf

Check "lecture 2" in Jörnsten's notes (linked in the previous lecture).

(optional: TooBigToFail_2013.pdf)

 

Demo1-2023.R

sleeptab.dat

Lecture2.R

 

44

Thursday 2 November Computer lab at 15.15-17.00 in MVF24 and MVF25: this lab is an intro to R and Rstudio. It is not structured with presentation of concepts. We are there to help if you have questions regarding exercises.

 

lab1.pdf

 

 

45

Tuesday: Proof of the unbiasedness; variance of the estimators; some residuals-based diagnostics; box-cox transformations; leverage values;

 

slides_3.pdf

Box-Cox transf. see section 12.4 in Rawling's book.

Check "lecture 3" in Jörnsten's notes.

boxcox-type_transforms.R

also see again Lecture2.R

 

45

Thursday: deletion-based diagnostics; MSE; unbiasedness of yhat;  expectation and variance of residuals; proof variability decomposition (SSEr, SStor, SSRegr); Rsquared; t-test construction

Optional exercises for self-study from Rawling's book Links to an external site.: exercise 1.1, exercise 1.4, relevant bits of exercise 1.9, ex. 1.10, ex. 1.16,

 

 

slides_4.pdf

StatisticalTables.pdf

Check "lecture 4" in Jörnsten's notes.

Lecture3.R

Demo3.R

Friday 10 Nov *: present  work for MiniAnalysis1-2023.pdf 

(this event is typically 60-70 minutes long)

earnings.csv

TV.dat

 

46

Tuesday: more on pvalues; unbiasedness of MSE;   confidence intervals for parameters and for E(Y0). Prediction intervals for Ypred0.

slides_5.pdf

(see also the relevant bits in lectures 4-5 of Jörsten's notes)

Demo4.R

Lecture4.R

 

46

Thursday: The Simpson's paradox and notation for multiple lin. regression.  Properties of the estimators in multiple regression and sampling distributions. t-test and categorical covariates (not everything); Topics  also found in chapter 9 ("Class variables") in Rawlings et al Links to an external site. up to sect. 9.3.

 

Optional exercises for self-study from Rawling's book Links to an external site.: exercise 3.5(part a and d), ex. 3.10, ex. 3.11, ex. 3.12 (we never do regression without intercept...but if you are interested...), ex 3.13

 

 

 

slides_6.pdf

advertising.R

Advertising.csv

multipleregression.R

 

 

 

47

Tue: models with categorical and numerical covariates.  problems with p-values and large datasets;

slides_7.pdf

.

 

auction.R

auction.dat

 

 

 

47

Thu: Multicollinearity;  Variance Inflation Factor; Partial F test; greedy variables selection: backward search.

slides_8.pdf

Check "lecture 6" in Jörnsten's notes

 

Lecture6.R

SA.dat

multicollinear.R

global+partial_Ftests.R

 

Friday 24 Nov at 13.15*: present minianalysis 2

(this event is typically 60-70 minutes long)

MiniAnalysis2-2023.pdf

 

48

Tue: Bias/variance tradoff and the pMSE. Training and testing, PMSE with regsubsets.

 

 slides_9.pdf

slides_9_annotated.pdf

 

Lecture7.R

SA.dat

cars.dat

PROJECT

project23.pdf

medinsur.csv

countydemographics.txt

no minianalysis this week

48

 

Thu: (reprise the end of slides 9) Mallow's Cp; interactions; adj-Rsquared;

slides_10.pdf

regsubsets-categorical-covar.pdf

regsubsets-olsrr-categorical.R

 

no minianalysis this week

 

49

Tue: Kullback-Leibler; AIC, BIC; K-fold CV; LOOCV; hat-matrix; residuals;

slides_11.pdf

(optional if you are interested) AkaikeEasyIntro.pdf

See also "lecture 14" in Jörnsten's notes.

Lecture8.R

rsquaredAICstep.R

regsubsets-categorical-CV.R

 

49

Thu: standardised residuals,  studentised residuals; Cook's distance, DFBETAs; intro GLMs;

slides_12.pdf

See also "lecture 14" in Jörnsten's notes.

(Additional support: Agresti's book chapters 4 and 14.4)

 

 

f6data.txt

leverage-residuals-cook.R

 

 

Friday 8 Dec at 13.15*: present minianalysis 3

MiniAnalysis3-2023.pdf

AirBnB_NYCity_2019.csv

 

50

Tue: the exponential family; Newton-Raphson; Poisson regression; confidence intervals for GLMs;  asymptotic properties of the MLE; CI for predictions; Wald test; deviance; likelihood ratio test; 

 

completion of slides_12.pdf then slides_13.pdf

 

f10.txt

f10b.txt

poissregr-awards.R

poisson_sim.csv

50

Thu: Poisson + offset term. Negative binomial regression also with offset. Quick tour through GLM diagnostics (diagnostics can be skipped for the exam)

 

slides_14.pdf

slides_14_annotated.pdf

shipdamage.R

poisson_nb.R

 

 

Back to the top

 

Computer labs and software

Software: We will use the statistical package R to analyze data, powered via the Rstudio interface. You will need to install both on your computer, see the instructions.
No previous knowledge of R is required. You are encouraged to attend the lab on Thursday 3´2 November to experiment with some basic analyses. No further computer lab will be given.

Some useful resources:

If you are familiar to MATLAB or Python, the following may be useful:

 

Back to the top

Course summary:

Date Details Due