MVE190 / MSG500 Linear statistical models Autumn 24

This page contains the program of the course: lectures, exercise sessions and computer labs. Other information, such as learning outcomes, teachers, literature and examination, are in a separate course PM. The course is in-person for what concerns the lectures (no Zoom, no video-recorded lectures). For compulsory mini-analyses presentations it will be possible to connect via zoom or be in the room (see below).

List of topics and potential exam questions: 2024_Topics and hence potential exam questions.pdf

 

Program

The schedule of the course is in TimeEdit

Lectures will be "in-person" in University rooms and will not be recorded.

We will have a single in-person non-compulsory computer lab at 15.15-17.00 on 7 November. Bring your own laptop. This lab is NOT structured with presentations of concepts. The lab is useful to get you started with the R/Rstudio software if you have no previous experience. If you already have some experience with R you probably won't need it.

Notice, there are some activities for which attendance is mandatory either in room or via zoom. These are the Fridays "mini-analyses". See the course PM for details and what to do in case you are unable to attend.

 

Lectures (below is a  plan based on last year. Deviations may occur).

(Here are the videos of the 2020 lectures. These are to be considered as useful material in case you miss a lecture, but we discourage you to rely on these videos as a substitute to attendance. Years have passed and a few things have changed and we don't want you to get confused!)

 

week Topics slides/notes code and data files
MiniAnalysis
45

Tuesday: General intro to the course and mention of several topics: bias in linear regression, least squares; parameters interpretation in simple linear regression;

 

 

 

lab0.pdf

slides_1.pdf

Also, check Check "lecture 1" in Jörnsten's notes found in Course PM

 

 

Lecture1.R

no mini analysis this week

45

Thursday: least squares estimates and relation to correlation; interpretation of coefficients with transformed variables. the 5 basic assumptions and residual plots.

 

formula-sheet.pdf

slides_2.pdf

Check "lecture 2" in Jörnsten's notes (linked in the previous lecture).

 

Demo1-2024.R

sleeptab.dat

Lecture2.R

 

no mini analysis this week

45

Thursday 7 November Computer lab BRING YOUR LAPTOP with R/RStudio in MVF24 and MVF25: the lab is not structured with presentation of concepts. We are there to help you go through some basic exercise for those that are new to R.

 

lab1.pdf

 

 no mini analysis this week

46

Tuesday: Proof of the unbiasedness of OLS parameter estimators; variance of the estimators; some residuals-based diagnostics; box-cox transformations; leverage values;

 

 

slides_3.pdf

Box-Cox transf. see section 12.4 in Rawling's book.

Check "lecture 3" in Jörnsten's notes.

boxcox-type_transforms.R

also see again Lecture2.R

 

46

Thursday: deletion-based diagnostics; MSE and proof of unbiasedness of MSE; unbiasedness of yhat;  expectation and variance of residuals; variability decomposition (SSEr, SStor, SSRegr); Rsquared; t-test construction

Optional exercises for self-study from Rawling's book Links to an external site.: exercise 1.1, exercise 1.4, relevant bits of exercise 1.9, ex. 1.10, ex. 1.16,

 

 

slides_4.pdf

StatisticalTables.pdf

Check "lecture 4" in Jörnsten's notes.

Demo3.R

Lecture3.R

Friday 15 Nov *: present  work for mini-analysis 1.

https://chalmers.zoom.us/j/65764692171
 Password: 857351

MiniAnalysis1-2024.pdf

kc_house_data.csv

TV.dat

 

47

Tuesday: completing t-tests; pvalues;   confidence intervals for parameters and for E(Y0). Prediction intervals for Ypred0.

slides_5.pdf

(see also the relevant bits in lectures 4-5 of Jörsten's notes)

Demo4.R

Lecture4.R

no mini analysis this week

47

Thursday: The Simpson's paradox and notation for multiple lin. regression.  Properties of the estimators in multiple regression and sampling distributions. t-test and categorical covariates (not everything); Topics  also found in chapter 9 ("Class variables") in Rawlings et al Links to an external site. up to sect. 9.3.

 

Optional exercises for self-study from Rawling's book Links to an external site.: exercise 3.5(part a and d), ex. 3.10, ex. 3.11, ex. 3.12 (we never do regression without intercept...but if you are interested...), ex 3.13

 

 

slides_6.pdf

multipleregression.R

advertising.R

Advertising.csv

 

 no mini analysis this week

48

Tue: models with categorical and numerical covariates.  problems with p-values and large datasets;

 

.

 

 

 

 

 

48

Thu: Multicollinearity;  Variance Inflation Factor; Partial F test; greedy variables selection: backward search.

 

 

 

 

Friday 29 Nov at 13.15*: present minianalysis 2

(this event is typically 60-70 minutes long)

MiniAnalysis2-2024.pdf

 

49

Tue: Bias/variance tradoff and the pMSE. Training and testing, PMSE with regsubsets.

 

 

 

 

no minianalysis this week

49

 

Thu: (reprise the end of slides 9) Mallow's Cp; interactions; adj-Rsquared;

 

 

 

no minianalysis this week

 

50

Tue: Kullback-Leibler; AIC, BIC; K-fold CV; LOOCV; hat-matrix; residuals;

 

 

 

50

Thu: standardised residuals,  studentised residuals; Cook's distance, DFBETAs; intro GLMs;

 

 

 

 

 

Friday 13 Dec at 13.15*: present minianalysis 3

 

 

51

Tue: the exponential family; Newton-Raphson; Poisson regression; confidence intervals for GLMs;  asymptotic properties of the MLE; CI for predictions; Wald test; deviance; likelihood ratio test; 

 

 

 

 

no mini analysis this week
51

Thu: Poisson + offset term. Negative binomial regression also with offset. Quick tour through GLM diagnostics (diagnostics can be skipped for the exam)

 

 

 

no mini analysis this week

 

Back to the top

 

Computer lab and software

You are encouraged to attend the lab on Thursday 7 November to experiment with some basic analyses. No further computer lab will be given but every week you can use a 2hrs "open room" slot with teaching assistants where you pop-in, ask a question and then leave: the dates will be specified in due time.

Software: We will use the statistical package R to analyze data, powered via the Rstudio interface. You will need to install both on your computer, see the instructions.
No previous knowledge of R is required.

Some useful resources:

If you are familiar to MATLAB or Python, the following may be useful:

 

Back to the top

Course summary:

Date Details Due