MVE190 / MSG500 Linear statistical models Autumn 20

Course PM

This page contains the program of the course: lectures, exercise sessions and computer labs. Other information, such as learning outcomes, teachers, literature and examination, are in a separate course PM.

 

List of topics and potential exam questions: 2020-21_Topics.pdf

Minutes of the midcourse meeting: Minutes of the mid-course meeting2020-21.pdf

 

Program

The schedule of the course is in TimeEdit. However all lectures will consist of pre-recorded videos that will be posted in the table below in due time. This means that you can watch the videos when you find it more convenient. The Lecturer will strive to upload material 1 day before the official lecture times as found in TimeEdit, so that you can use part of the "lecture time"  to ask questions instead, read more below.

On the other hand, we will have a single computer lab at 15.15-17.00 on 5 November which will be actually happening "live", on zoom. This lab is not structured with presentations of concepts, you will work by yourself and during the lab you can book zoom chats for questions & help with the Teaching Assistants (link to be added in the timetable below). The lab is useful to get you started with the R/Rstudio software. If you already have some experience with R you probably won't need it.

Notice, there are some activities for which attendance is mandatory. These are the Fridays "mini-analyses". See the course PM for details and what to do in case you are unable to attend.

There will generally be two Zoom meetings each week, of at most 30 minutes each, where we can all meet: 

these will be during "lecture" time: at 9.30-10.00 on Tuesdays and 14.00-14.30 on Thursdays always at https://chalmers.zoom.us/my/picchini

  • You will be able to ask questions related to the online learning material.
  • You may ask me to review or clarify material (if you mail me questions on beforehand I can prepare, but this is not necessary).
  • You may comment on or discuss course content or course organization.
  • We can discuss whatever else we want to discuss in the whole group.
  • You can definitely email me at picchini@chalmers.se. But you can also use the "Discussion" forum here on Canvas.

Finally, I will have "office hours", one hour each Wednesday 9.00-10.00 on Zoom (except the first week), where students can contact me individually on https://chalmers.zoom.us/my/picchini. The office hours work as follows: You enter the "waiting room" of my zoom office, and I admit one student at a time from that waiting room, on a first-come first-served basis. Each student should limit their question time to about 10 minutes, if there is a line. I will also try to answer questions posed here in Canvas when I see them.

 

Lectures (below is a  plan based on last year. Deviations may occur).

Notice: regarding video lectures, you can (i) download them as MP4 files, by clicking on the file names appearing over the preview; (ii) alternatively you can stream those. I have noticed Canvas compresses the video in a way that, occasionally, some text might be slightly blurry. A downloaded version should be of higher quality.

week Topics video lectures
slides/notes code and data files
MiniAnalysis
45

Tuesday: General intro to the course and mention of several topics: bias in linear regression, least squares; parameters interpretation in simple linear regression; the 5 basic assumptions.

 

 

 

lect1-Rinstallers.mp4

L1_1.mp4

L1_2.mp4

L1_3.mp4

L1_4.mp4

L1_4FIX.mp4

(L1_4FIX fixes the last 10 minutes of L1_4)

lab0.pdf (you must go through this file before the Thursday lab); Jörnsten's notes; slides_1.pdf; Check "lecture 1" in Jörnsten's notes.

Lecture1.R

no minianalysis presentations this week.

45

Thursday: derivation of least squares estimates. Proof of the unbiasedness; variance of the estimators; began residuals-based diagnostics

 

L2_1.mp4

L2_2.mp4

L2_3.mp4

L2_4.mp4

slides_2.pdf

formula sheet for properties of expectation, variance etc

Check "lecture 2" in Jörnsten's notes.

Demo1-2020.R

Lecture2.R

sleeptab.dat

no minianalysis presentations this week.

45

Thursday Computer lab at 15.15: this lab is not structured with presentation of concepts. We are there to help if you have questions regarding exercises. Book your questions below.

https://docs.google.com/spreadsheets/d/1FtR3TOk_ti1EKpu_JOyMucatGy-vxNToEul4xf4Lknc/edit#gid=0

 

lab1.pdf

repair1.txt

 

46

Tuesday: leverage values; deletion-based diagnostics; MSE; unbiasedness of yhat and var of yhat; 

 

L3_1.mp4

L3_2.mp4

L3_3.mp4

L3_4.mp4

slides_3.pdf

Check "lecture 3" in Jörnsten's notes.

Demo1.R (updated!)

Lecture2.R (updated!)

 

46

Thursday: expectation and variance of residuals; standardised residuals; unbiasedness of MSE; proof variability decomposition (SSEr, SStor, SSRegr); Rsquared; t-test construction

 

Optional exercises for self-study from Rawling's book: exercise 1.1, exercise 1.4, relevant bits of exercise 1.9, ex. 1.10, ex. 1.16,

L4_1.mp4

L4_2.mp4

L4_3.mp4

L4_4.mp4

slides_4.pdf

Check "lecture 4" in Jörnsten's notes.

Lecture3.R

Demo3.R

Friday *: present  work for MiniAnalysis1.pdf

bikesharing.csv

TV.dat

*typically 1 hour only

 

47

Tuesday: confidence intervals for parameters and for E(Y0). Prediction intervals for Ypred0. The Simpson's paradox and notation for multiple lin. regression 

L5_1.mp4

L5_2.mp4

L5_3.mp4

L5_4.mp4

slides_5.pdf

Check "lecture 5" in Jörnsten's notes.

Lecture5.R

Demo4.R

 

no minianalysis presentation this week

47

Thursday: properties of the estimators in multiple regression and sampling distributions.  Confidence intervals, t-test and categorical covariates (not everything);  problems with p-values and large datasets;  Topics  also found in chapter 9 ("Class variables") in Rawlings et al up to sect. 9.3.

 

Optional exercises for self-study from Rawling's book: exercise 3.5(part a and d), ex. 3.10, ex. 3.11, ex. 3.12 (we never do regression without intercept...but if you are interested...), ex 3.13

L6_1.mp4 

L6_2.mp4

L6_3.mp4

L6_4.mp4

slides_6.pdf

Check "lecture 10" in Jörnsten's notes.

multipleregression.R (updated!)

cola.dat

auction.R

auction.dat

 

 

no minianalysis presentation this week

48

Tue: models with categorical and numerical covariates. Multicollinearity and VIF. Partial F test.

L7_1.mp4

L7_2.mp4

L7_3.mp4

L7_4.mp4

slides_7.pdf

Check "lecture 6" in Jörnsten's notes.

Lecture6.R

SA.dat

multicollinear.R

global+partial_Ftests.R

 

48

Thu: greedy variables selection: backward search. Global F test. Bias/variance tradoff and the pMSE. Training and testing (not everything: until slide 37)

slides_8.pdf

(until slide #37)

Check "lecture 7" in Jörnsten's notes.

Lecture7.R

 

Friday *: present minianalysis 2

MiniAnalysis2-2020.pdf

*typically 1 hour only

 

49

Tue: (from slides_8.pdf) PMSE with regsubsets. Then (slides_9.pdf) interactions, adj-Rsquared, AIC

 

L9_1.mp4 

L9_2.mp4

L9_3.mp4

L9_4.mp4

(finished slides_8.pdf)

slides_9.pdf

regsubsets-categorical-covar.pdf

(optional

AkaikeEasyIntro.pdf)

See also "lecture 11" in Jörnsten's notes.

regsubsets-categorical.R(updated!)

49

 

Thu: BIC; K-fold CV; LOOCV; hat-matrix; residuals (intro)

L10_1.mp4 

L10_2.mp4

L10_3.mp4

L10_4.mp4

L10_5.mp4

slides_10.pdf (updated)

 

See also "lecture 8" in Jörnsten's notes. But notice there are typos there, as pointed in the slides.

Lecture8.R

rsquaredAICstep.R

regsubsets-categorical-CV.R

Friday *: present minianalysis 3

Mini3-2020.pdf

*typically 1 hour only

PROJECT:

project20.pdf (new!)

medinsur.csv

bicyclist_counts.csv (new!)

50

Tue: standardised residuals,  studentised residuals; Cook's distance, DFBETAs; intro GLMs and the exponential family; Newton-Raphson; Poisson regression;

 

L11_1.mp4

L11_2.mp4

L11_3.mp4

L11_4.mp4

slides_10.pdf (updated and completed)

slides_11.pdf

See also "lecture 14" in Jörnsten's notes.

(Additional support: Agresti's book chapters 4 and 14.4)

leverage-residuals-cook.R

f6data.txt

poisson_nb.R

f10.txt

f10b.txt

 

 

50

Thu: more on Poisson regression; confidence intervals for GLMs;  asymptotic properties of the MLE; CI for predictions; Wald test; deviance; likelihood ratio test;  Poisson + offset term up to slide 36

L12_1.mp4

L12_2.mp4

L12_3.mp4

L12_4.mp4

slides_12.pdf

poissregr-awards.R

poisson_sim.csv

 

51 Completed Poisson + offset. Negative binomial regression. Quick tour through GLM diagnostics

L13_1.mp4

L13_2.mp4

L13_3.mp4

slides_13.pdf shipdamage.R

 

Back to the top

 

Computer labs and software

Software: We will use the statistical package R to analyze data, powered via the Rstudio interface. You will need to install both on your computer, see the instructions.
No previous knowledge of R is required. But you are encouraged to attend the lab on Thursday 5 November. No further computer lab will be given.

Some useful resources:

If you are familiar to MATLAB or Python, the following may be useful:

 

If you already have a copy of R installed on your computer, please check that its version is >= 3.6.0. If it is older install a more recent one.

Back to the top

Course summary:

Date Details Due